Hadoop Developer Resume
Chicago, IL
SUMMARY:
- Over 10+ years of Experience in software systems Development client/server Business systems with strong Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Big Data Hadoop, IBM Information server 11.3/8.5/9.1/ Data Stage 7.5/7.1 and Quality Stage.
- Have 2 years of experience with Hadoop Ecosystem including HDFS, MapReduce, Impala, Hive, Scoop, Yarn, Cloudera and PIG.
- Efficient in all phases of the software development lifecycle, coherent with Data Cleansing, Data Conversion, Performance Tuning, Unit Testing, System Testing, User Acceptance Testing.
- Experience in analyzing Star Schema/Snowflake Schema, Dimensional Data Modeling.
- Extensively worked with Parallel Processing for splitting bulk data into subsets to distribute the data to all available processors to achieve best job performance.
- Strong in process creation for both Server and Parallel in Datastage.
- Data migration, synchronization, consolidation and cleansing of operational systems, such as legacy, ERP and CRM applications, to enable strategic, tactical and operational business intelligence.
- Experience in evolving strategies and developing Architecture for building functional OLTP systems from the OLAP systems.
- Experience in designing SQL queries, stored procedures, triggers, scripts, cursors Create, maintain, modify, and optimize SQL Server databases.
- Strong analytical, problem solving skills with excellent communication and interpersonal skills.
TECHNICAL SKILLS:
Knowledge Domains: Insurance, Banking, Human resource and Retail.
Hadoop Ecosystem: HDFS, Impala, Hive, Pig, Spark
Web Technologies: HTML, Java Script
Platform/Technologies: IBM Information server 11.3/9.1/8.1/ Data Stage 7.5/7.1,Unix, DB2 Oracle 10g, 11i
Tools: /Packages: Control - M, Actimize 2.0.9/4.0,XML
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Processing the Delta from CDC subscription to Operational Data Store (ODS) in HDFS using Impala.
- Developing ETL using Hive, Impala based on the Source to Confidential mapping provided by Data Governance team.
- Providing the data extracts in Parquet format to Analytics team from ODS using Impala.
- Ensure Data security for EDW sensitive data in/out of HDFS using HPE encryption during Sqoop phase.
- Developed Data migration procedure to import data from Legacy systems (DB2, AS400) into HDFS.
- Generating the Metadata code artifacts in HDFS for the Source tables using Python.
- Processing the CDC data into HDFS for DB2, AS400 source system tables using Data Stage.
- Automation of incremental data to HDFS using Java based spring batch workflow management, Control-M scheduler.
- Scheduling and Maintaining batch job dependencies using Control-M.
- Responsible for Operational Data Store (ODS) layer to EDW in Hadoop.
- Supporting the Business critical Batch jobs in Hadoop to maintain the SLAs.
- Developed Enterprise Data Platform Support Model and transitioned to application support team
Environment: Oracle 11i, UNIX 5.1, Control-M, Impala, Cloudera, YARN
Confidential, Chicago, IL
Sr.ETL Developer
Responsibilities:
- Understanding and gathering the project requirements & detail analysis.
- Developed detailed development Strategy for the entire application and developed various jobs by consuming data from different source to landing tables and transforming to base tables.
- Resolving issues with the business units, developers and effectively coordinating development efforts with other areas/teams
- Ran the ETL process through Control M scheduler.
- Effectively used DataStage Manager to Import/Export projects from development server to production server. Parameterized jobs for changing environments.
- Proficient in Data analysis, Data modeling, Database design, Data migration and Data acquisition using the Datastage.
- Developed Jobs using DataStage 8.x to load the data from different sources such as database tables and sequential files to the Confidential database tables.
- Involved in testing of Stored Procedures and Functions, Unit and integration Testing of DataStage Jobs, Batches, fixing invalid Mappings.
- Written shell scripts to schedule the job sequences on the DataStage server.
- Involved in implementing DataStage Components to set up the repository using DataStage Manager, Utilizing the Manager tools to import the source and Confidential database schemas
- Created and tested ETL processes composed of multiple DataStage jobs using DataStage Job Sequencer.
- Developed Shell scripts to automate the processes. Developed Shared Containers and re-used in multiple jobs.
- Created database tables and partitioned the tables to optimize the data retrieval.
Environment: Oracle 11i, UNIX 5.1, Control-M, Datastage 8.5/11.3
Confidential, New York, NY
Team Lead
Responsibilities:
- Developed Jobs using DataStage 8.x to load the data from different sources such as database tables and sequential files to the Confidential database tables.
- Understanding the client requirements by studying Approach Document, functional document & prepared technical specification.
- Developed ETL jobs to load the data into the Confidential tables (HCM).
- Used DataStage Designer for Exporting and importing the jobs between development, testing and the production servers.
- Created job sequencers to automate the ETL process.
- Involved in performance tuning by creating the indexes in the database level & modified the ETL jobs to run in parallel wherever required.
- Involved in Unit testing and supported System Integration testing, Quality Assurance Testing, User acceptance testing and Cut-Over Activities.
- Involved in testing of Stored Procedures and Functions, Unit and integration Testing of DataStage Jobs, Batches, fixing invalid Mappings.
- Involved in migrating the code from one environment to the other.
- Replaced manual extraction with automatic process using DataStage tool
- Extensively wrote user SQL coding for overriding for generated SQL query in DataStage.
- Work with the development team and do a constant reviews of milestones and intermediate screen reviews and provide feedback on functionality and usability in both ETL Datastage and Batch jobs.
Environment: PeopleSoft EPM 8.9, Oracle 10g, UNIX, Datastage 8.1
Confidential, New York, NY
Sr. ETL Developer
Responsibilities:
- Understanding the client requirements by studying Approach Document, functional document & prepared technical specification.
- Used the DataStage Designer to develop processes for extraction, cleansing, and transforming, integrating and loading data into data warehouse database.
- Extensively wrote user-defined SQL coding for overriding Auto generated SQL query in DataStage.
- Used the DataStage Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into database.
- Developed ETL jobs to load the data into the Confidential tables (HCM).
- Created job sequencers to automate the ETL process.
- Involved in performance tuning by creating the indexes in the database level & modified the ETL jobs to run in parallel wherever required.
- Involved in Unit testing and supported System Integration testing, Quality Assurance Testing, User acceptance testing and Cut-Over Activities.
- Involved in migrating the code from one environment to the other.
Environment: Datastage 9.1, Oracle 11i, UNIX 5.1, Qualify
Confidential, New York, NY
ETL Developer
Responsibilitilies:
- Understanding the requirement specifications
- Created all the supporting documents like process flows, LLD’s, HLD’s & STM document.
- Created the Visio’s for control M flow and for High level design.
- Used DataStage Director to execute and monitor the jobs.
- Reviewing of Jobs as per LLD.
- Developed Oracle Queries to load data from source to foundation tables.
- Extracting data, cleansing, transforming, integrating, and loading data into Data Warehouse.
- Design and Develop the ETL Parallel jobs in Data Stage.
- Experience in Unix Shell Scripting.
- Created the Visio’s for control M flow and for High level design.
- Involved in logging error messages and fix the errors.
- Preparation of Test Scenarios, Test Cases and Test Data
- Coordinating with the onsite team for requirement.
Environment: IBM InfoSphere DataStage server 8.1, PeopleSoft EPM Application Version 9.0 People tools Version 8.5, Unix 5.1, Oracle 11g.