Senior Hadoop Developer Resume
Richmond, Va
SUMMARY:
- 8 years of software development experience which includes 5 years on Big Data Technologies like Hadoop, Hive, Pig, Sqoop, Hbase, Flume and Spark.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Good understanding of NoSQL Databases.
- Worked in Windows, UNIX/Linux platform with different technologies such as SQL, PL/SQL, XML, HTML, CSS, Java Script, Core Java etc.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera
- Experience in using IDEs like Eclipse and NetBeans.
- Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams.
- Working knowledge of database such as Oracle 10g.
- Experience in writing Pig Latin scripts.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyse data using visualization/reporting tools.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Worked on Agile methodology.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Clear knowledge of rack awareness topology in the Hadoop cluster
- Knowledge on Rackspace.
- Experience in use of Shell scripting to perform tasks.
- Great communication skills/ oral skills.
- Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism.
- Basic knowledge in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
- Extensive programming experience in developing web based applications using Core Java, J2EE, JSP and JDBC.
- Comprehensive knowledge of Software Development Life Cycle coupled with excellent communication skills.
TECHNICAL SKILLS:
Languages: JDK1.6/1.7, 2EE 1.5Scripting Languages: C, JAVA, SQL, PIG LATIN, Bash scripting, JSON
Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Impala (POC), Cassandra, Oozie, Zookeeper, Flume, Spark (POC), Kafka (POC)
Operating Systems: Windows 2008/XP/8.1, UNIX, Linux, Cento OS
RDBMS: Oracle 10g, SQL Server 2005/2008, MS-Access, MySQL, NoSQL
Modeling Tools: UML on Rational Rose 4.0.
Web Technologies: HTML, XML, JSP, CSS, Ajax, jQuery
Web Services: WebLogic, Web Sphere, Apache Cassandra, Tomcat
IDE s: Eclipse, NetBeans, WinSCP
Familiar GUIs: MS Office Suite, MS Project
Servers: Apache Tomcat
PROFESSIONAL EXPERIENCE:
Confidential, Richmond, VA
Senior Hadoop Developer
Responsibilities:
- Cluster capacity planning along with operations team and management team and Cluster maintenance as well as creation and removal of nodes, HDFS support and maintenance.
- Manage and review Hadoop log files, File system management and monitoring.
- Involved in Cluster upgrade and required jobs are modified.
- Involved in implementing security on Hadoop Cluster with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Data migration from RDMS to hadoop using sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
- Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Implemented Hive UDF for comprehensive data analysis.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
- Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved with various teams on and offshore for understanding of the data that is imported from their source.
- Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
- Involved in data visualization and provided the files required for the team by analysing the data in hive and developed Pig scripts for advanced analytics on the data.
- As a part of POC used the Amazon AWS S3 as an underlying file system for the Hadoop and implemented the elastic Map-Reduce jobs on the data in S3 buckets.
- Participated with operations team for Spark Installation on Secured cluster.
- Provided updates in daily SCRUM and Self planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task
Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential, Milwaukee, WISenior Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Responsible for managing data coming from different sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Gained good experience with NOSQL database.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed custom aggregate functions using Spark SQL and performed interactive querying, on a POC level.
- Responsible for provisioning, maintaining and improving upon server infrastructure, split between physical data center and AWS.
- Written Kafka REST API to collect events from front end
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Implemented working with different sources using Multi Input formats using Generic and Object Writable.
- Cluster co-ordination services through Zookeeper.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with the Data Science team to gather requirements for various data mining projects.
Environment: Cloudera CDH 4, HDFS, Hadoop 2.2.0 (Yarn), Spark Flume 1.5.2, Eclipse, AWS, Map Reduce, Hive 1.1.0, Pig Latin 0.14.0, Java, SQL, Sqoop 1.4.6, Centos, Zookeeper 3.5.0 and NOSQL database.
Confidential, TXSenior Hadoop Developer
Responsibilities:
- Involved in defining job flows, managing and reviewing log files.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Responsibilities included designing and developing new back-end services, maintaining and expanding our AWS infrastructure, and providing mentorship to others on my team.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Wrote multiple java programs to pull data from Hbase.
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment, on a POC level.
- Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data, on a POC level.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Spark, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4.
Confidential, Seattle, WAHadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop and Flume.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Worked on Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive using Sqoop.
- Worked on Writing Hive queries for data analysis to meet the business requirements.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
- Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created internal and external tables with properly defined static and dynamic partitions for efficiency.
- Implemented Hive custom UDF’s to achieve comprehensive data analysis.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
- Developed Pig scripts for advanced analytics on the data for recommendations.
- Experience in writing Pig UDF's and macros.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
Environment: Hadoop, MapReduce, Hive, Oozie, Sqoop, Flume, JAVA, LINUX, CentOS
Confidential, Boston, MAHadoop Developer
Responsibilities:
- Supported MapReduce Programs running on the cluster.
- Given POC of FLUME to handle the real time log processing for attribution reports.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: Hadoop, CDH4, PIG, HIVE, Sqoop, Flume, SQL, Oozie, MapReduce, Java.
ConfidentialJava Developer
Responsibilities:
- This project was developed for one of the top client in the financial sector.
- It was developed with a flexible design, which provides the platform to offer on-line integrated financial services to the bank customers.
- Using this web based application customers can conduct activities like on-line retail banking, secure messaging,
- Credit Card Payments, Shopping, Corporate banking and even payment to third party individuals and institutes.
Environment: Java, HTML, CSS, XML, JavaScript, JQuery, Apache Tomcat, Ant, SQL, PL/SQL and Shell scripting
ConfidentialJava Developer
Responsibilities:
- Credit Information Bureau India Ltd. is one of the leading and secured financial institutions in India.
- Loan Approval and Payment system is an automated multi-application system by which customers of the bank can have quick processing of their loan applications and set up one-time or recurring payments.
- The customers can use the User Interface to keep track of all aspects of their loans and their payment details.
Environment: Java, J2EE, JUnit, XML, JavaScript, Log4j, CVS, Eclipse, Apache Tomcat, and Oracle.
