Hadoop Developer Resume
Hartford, CT
SUMMARY
- Over Six years of experience in data warehousing and Big Data using Hadoop, Pig, Sqoop, Hive, HDFS, Informatica Power Center 9.1/8.6, ETL Concepts.
- Experience with Hadoop echo systems Hive, SQOOP, Oozie and Pig.
- Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce
- Have Experience on Hive queries for data analysis to meet the business requirements.
- Experience on data loaded in HIVE/Impala tables using SQOOP.
- Experience in validating the data requirements and ensuring the right data is extracted.
- Experience in validating the files loaded into HDFS.
- Validating that there is no data loss by comparing HIVE table data against RDBMS data.
- Created ETL test data for all mapping rules to test the functionality of the Informatica work flows.
- Sound Knowledge and experience in Metadata and Star schema/Snowflake schema. Analyzed Source Systems, Staging area, Fact and Dimension tables in Target D/W.
- Experience in preparing Test Strategies, develop Test Plan, Test Cases, writing Test Scriptsby decomposing Business Requirements, and develop Test Scenarios to support quality deliverables.
- Experience in working with Software Development team in resolving Defects, presenting the Defect Status reports, resolving requirement, and observed design inconsistencies.
- Expertise in querying and testing Oracle, MS SQL Server using SQL for data integrity.
- Having good experience in ETL Testing, Developing and Supporting Informatica applications.
- Worked on Control - M to configure, monitor and schedule ETL routines.
- Understanding the various levels of Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).
TECHNICAL SKILLS
Big Data Technologies: HDFS, YARN, Sqoop, Hive, Pig, Oozie
ETL Tool: Informatica PowerCenter 8.6, 9.1
RDBMS: Oracle10g/11g, MS SQL Server
Programming Languages: SQL, PL/SQL
Business Applications: MS Office
Scripting Languages: HTML, XML
Operating Systems: Windows 9x/2000, UNIX Basics
PROFESSIONAL EXPERIENCE
Confidential, Hartford, CT
Hadoop Developer
Responsibilities:
- Involving in review of functional and nonfunctional requirements.
- Developing workflows using custom MapReduce, Pig, Hive and Sqoop.
- Developing multiple MapReduce jobs in java for data cleaning and preprocessing. The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled Data Analysts to write Hive queries. Also developed Hive queries for Data Analysts.
- Importing and exported data into HDFS and Hive using Sqoop.
- Involving in loading data from UNIX file system to HDFS.
- Extracting files from different RDBMS through Sqoop and placed in HDFS.
- Managing and reviewing Hadoop log files.
- Involving in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Developing suite of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Developing workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Using Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Familiarizing with NoSQL database.
- Supporting Map Reduce Programs those are running on the cluster.
- Executing queries using Hive and developed MapReduce jobs to analyze data.
- Developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developing the Pig UDF's to preprocess the data for analysis.
Environment: HDFS, YARN, HIVE, SQOOP, J2EE, Oracle, SQL, UNIX.
Confidential, West Point, PA
Big Data Developer
Responsibilities:
- Worked on a live 65 nodes Hadoop cluster running CDH4.4
- Worked with highly unstructured and semi structured data of 70 TB in size (210 TB with replication factor of 3)
- Extensive experience in writing Pig scripts to transform raw data from several data sources in to forming baseline data.
- Developed Hive scripts for end user / analyst requirements for adhoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Worked in tuning Hive and Pig scripts to improve performance.
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries
- Experience in using Sequence files, AVRO and HAR file formats.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Documented tool to perform "chunk uploads' of big data into Google BigQuery.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN)
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Good working knowledge of HBase.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Extracted feeds form social media sites such as Twitter.
- Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster
- Involved in loading data from UNIX file system to HDFS.
- Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Cloudera, HDFS, HIVE, HUE, SQOOP, Oracle, UNIX, JIRA, Impala
Confidential
Senior ETL Developer
Responsibilities:
- Discussions with the Business Analyst and Work Streams (Source Systems) about the Business Requirements and Pre-requisites.
- Analyze the result set of data loaded by monitoring the properties using the Workflow Monitor.
- Monitored the scheduled Informatica jobs through Control-M Enterprise Manager as part of Test Execution.
- Prepared Test Plan, Test Scenario’s and Test Cases Design and Review.
- Worked on Sanity, System testing, Re-Testing and Regression Testing of Data Warehouse STLC.
- Written Test Cases for Data Correctness, Data Transformation, Metadata, Data Integrity and Data Quality Tests.
- Worked on SQL scripts to validate the data in the warehouse.
- Executed test cases on Source database tables, Staging tables and Data warehouse tables.
- Participated in Test Case walkthroughs, Review meetings.
- Logged defects in the defect-tracking tool Quality Center tested and maintained trailing history of the defects found in the software.
- Presented Test cases and Test Results to the client.
- Validated the data loaded in HDFS against the RDBMS tables using Hive and SQL Queries.
- Prepared Test Plan/Approach, Test Scenario’s and Test Cases Design and Review.
- Written Hive/Oracle queries for data validation to meet the business requirements.
- Tracked the defects using Quality Centre tool and generated defect summary reports.
- Created Daily, Weekly Status Reports to state testing progress, Issues, Risks to the Project Lead.
- Provided training sessions to Testing team on DWH Concepts, ETL Process and ETL Testing.
- Assign tasks, monitor and review status and progress.
- Mapped Requirements to Test Cases Requirement Traceability Matrix .
- Uploaded Test cases into QC Test Plan and moved test cases into Test Lab component.
- Captured the Run statistics of Informatica workflows as part of Performance Testing, and done Load Testing of Informatica objects through High volumes of data.
Environment: Informatica 9.1, Oracle 10g, UNIX
Confidential
ETL Developer
Responsibilities:
- Worked on Informatica - Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer.
- Imported data from various sources transformed and loaded into Data Warehouse Targets using Informatica.
- Worked with different sources such as Oracle, SQL and flat files.
- Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Filter, Lookup and Sequence generator.
- Ran the Informatica workflows scheduled in Control-M Tool as part of QA/UAT data loads.
- Analyze the result set of data loaded by monitoring the properties using the Workflow Monitor.
- Written System Test Cases to test the functionality of the Informatica objects.
- Involved in Peer Review of system test cases.
- Involved in Source to Target Mapping sheet, Design Document’s walkthrough’s.
- Executed test cases on Source database tables, Staging tables and Data warehouse tables.
- Logged defects in the defect-tracking tool Quality Center tested and maintained trailing history of the defects found in the software.
- Presented Test cases and Test Results to the client.
- Created Daily, Weekly Status Reports to monitor testing progress, Issues and Risks of the Project
Environment: Informatica 8.6, Oracle 10g