Senior ETL Developer Resume

SUMMARY

10+ Years of Experience in Information Technology in developing large - scale Data warehouse, Data Lake, BigData and Analytics solutions.
Specialized in BigData architecture and large scale distributed data processing for BigData Analytics using Hadoop ecosystem.
Experienced in all aspects of software development life cycle including analysis, design, development, modelling, testing, debugging, deployment, operational support of significant Data Integration, Analytics, and Machine Learning projects (Big Data, DW, BI) in Waterfall as well as SCRUM/Agile methodologies.
Experienced in development of a framework for common and consistent use of the intended design for batch data ingestion, data processing and delivery of massive datasets.
Good Exposure and understanding in Finance, Retail & Insurance domain.
Expertise in Hadoop Ecosystems HDFS, Pig, HBase, Hive for analysis, TDCH for data migration, Sqoop for data ingestion, Oozie for scheduling.
Good experience working with Hadoop distributions in Hortonworks.
Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
Optimized HiveQL/Pig scripts by using execution engine like Tez.
Experience in writing Adhoc queries moving data from HDFS to HIVE & analyzing the data using HIVEQL
Experience in building Pig scripts to extract, transform and load different file formats- TXT, XML, JASON data onto HDFS, HBase and Hive for data processing.
Experience in Ingesting data to different Tenants in Data Lake and creating snapshots tables for Consumption. Experience in working on HBASE tables with good understanding on Filters.
Worked Sqoop jobs with incremental load to populate Hive External tables.
Very good understanding of Partitions, bucketing concepts in Hive. Designed and Managed External tables in Hive to optimize performance.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
Used HBase in accordance with PIG/Hive as and when required for real time low latency queries.
Excellent knowledge & experience in Data Warehouse Development life cycle, Dimensional modelling, Repository & administration, implementation of Star and Snow Flake Schemas.
Extensive experience in Extraction, Transformation, Loading (ETL) data from various sources into Data Warehouse and Data Mart using DataStage 7.5/8.5.
Experience in integration of various data sources like Oracle, DB2 UDB, Teradata, XML & Flat files into the Staging Area.
Handling huge volume of data using Teradata utilities like BTEQ, Fast load, Multi Load & TPUMP.
Reviewed SQL to ensure efficient plans and performance CPU thresholds. Worked closely with Micro Strategy Architects to adjust SQL as necessary.
Experience in Project Management. Planning, Task identification, Tracking, resource & reporting.
Created & deployed Work orders & resolved problem request tickets on HPSM /Remedy tool and ServiceNow to track the incidents. Worked on Version Control tools like MSTFS, GITHub and Jenkins.
Mentors and assists lower level architects, business analysts and developers. Technical architect responsible to come out with the design patterns to load the data in DWH (ETL & ELT).
Lead a team of 5 to 10 resources & Developed and Guided Test Strategies and Plans for Unit, Integration, System and UAT.
Expert in designing ETL Architecture and star schema and snow flake dimensional models.
Extensively worked and implemented the Parallelism and Partitioning techniques in DWH application.
Partners with business line managers and Centre of Excellence leaders, providing technical and system expertise, to synergize IT strategic direction and product concepts.
Strong design skills in Logical Data modelling, Physical Data modelling using Erwin tool and Metadata Management for OLTP (Relational) and OLAP (Dimensional ) systems. Proficient in Data Normalization and De-normalization techniques with extensive experience.
Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/service.
Familiar with building end to end solution, onshore-offshore model and multi-site development projects, System Integration, migration, maintenance and Support projects.
Proven organizational, time management and multi-tasking skills and ability to work independently and quickly learn new technology and adopt to new environment.

Core Strengths

Extensive Experience in handling Bigdata Hadoop projects and extensively worked in Hive, HBase, Pig, Kafka, Sqoop, Oozie and Sqoop.
Strong Experience in projects covering the entire activities required in ETL/ELT/Semantic End-To-End implementation.
Strong Experience in migration/enhancement projects in BigData, DataStage, Teradata, Oracle and DB2.
Exposure to Hadoop, ETL and Teradata performance tuning. Extensively worked in Hadoop Migration and ETL/ELT Migration applications using different Data Stage tools and Hadoop.
Good understanding of Agile practices with experience in working agile model.

TECHNICAL SKILLS

Hadoop Ecosystem: Hive, HBase, Pig, Kafka, MapReduce, SQOOP, Oozie, Python

ETL Tools: IBM WebSphere DataStage and Quality Stage 8.7

DataStage: (Ver7.x, 8.x,11.3)(Manager, Administrator, Designer, Director)

Dimensional Data Modelling: Data Modelling, Star Schema Modelling, Snow-Flake Modelling, FACT and Dimensions tables, physical and logical data modelling, Erwin 4.1.2/3.x

Operating Systems: Linux, Unix(Solaris, AIX), Windows 95/98/NT/2000/XP,Windows7

Databases: Oracle 10g/9.x/8.x, SQL Server, DB2,Teradata, PostgreSQL

Tools: MS-Access, MS Power Point, Visio, Share Point, Putty, Telnet, WINSCP, Oracle, AQT, Teradata SQL Assistance

Scheduling: Control M, Tivoli, Autosys

Agile Tools: NinJira, Version One

Programming Languages: SQL, UNIX/Linux Shell Scripting

Version Control Tools: MSTFS, GIT Hub

Bundle/Package Creation & deployment: MSTFS (Build tool)- to create and upload Packages or Bundles Deploy (Deployment tool) - to deploy the Packages or Bundles, Jenkins, Data Stream

PROFESSIONAL EXPERIENCE

Confidential

Senior Engineer

Environment: Hive, Shell Scripting, Autosys, Service Now, Oracle, Sqoop

Responsibilities:

Experienced with requirement gathering, Project planning, architectural solution design and development process in agile environment.
Prepared a Hadoop framework to be able to frequently bring in data from the source and make it available for consumption.
Worked on the Sqoop, Hive for Hadoop data landscape. Created Hive scripts to build the foundation tables by joining multiple tables.
Experience in ingesting data to different tenants in Data Lake.
Responsible for building scalable distributed data solutions using Hadoop Eco System.
Facilitated meetings with business and technical teams to understand requirements and translating requirements to technical solutions and implement in Hadoop Platform.
Review of requirements, developing technical design documentation, validation of test scripts and coordinate the implementation of the same into production and support activities.
Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including HIVE, Sqoop and HDFS.
Created partitioned tables using ORC file Format in Hive for best performance and faster querying and handled the huge volume of data.
Created Sqoop jobs with incremental load to populate Hive tables.
Experience in optimizing the hive queries to improve the performance of hive queries. Performed extensive data analysis using Hive.

Confidential

Sr BI Engineer

Environment: Hive, Oozie Co-Ordinator, HBase, Pig, Shell Scripting, Sqoop, Control-M, Service Now, Git Hub, Jenkins, Query Surge, Data Stream, NinJira, Teradata, Python, Kafka

Responsibilities:

Worked on the Sqoop, Hive, HDFS file system for Hadoop archival from Teradata. Experience in using Sequence files and ORC file formats.
Created Hive scripts to build the foundation tables by joining multiple tables.
Change in source from IMN, PRISM to CORONA (Which is a Kafka Feed) ingested into HIVE table which is in JASON format Parsed into HIVE tables using HIVE function.
Used the TDCH utility for data ingestion from Hive to Teradata.
Created the Partitioned Tables using ORC File format and handled the huge volume of data from landing to foundation hive tables.
Developed Oozie workflow & Oozie Co-ordinators for scheduling and orchestrating the ETL Process. Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Created the jobs to transfer the data across different system using DataStream for the Initial source data analysis.
Designed and Developed the Sqoop full and Incremental load scripts to bring the data from different heterogeneous source systems.
Created Sqoop jobs with incremental load to populate Hive tables.
Design & Develop the wrapper scripts, Oozie xml’s workflows. Create the Oozie Co-Ordinator for Hadoop code execution.
Validate the data which has been ingested into Hive tables. Created the Hive Partition Internal tables.
Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
Executable code deployment to multiple Servers using CICD process like code versioning tools GitHub and Jenkins.
Analyze & Understanding the ETL/ELT Merchandising ITEM application data and the existing ETL/ELT code end to end process.

Confidential

Sr BI Engineer

Environment: Hive, HBase, Oozie Co-Ordinator, Shell Scripting, Sqoop, Query Surge, Service Now, Git Hub, Jenkins, NinJira, Teradata, Python

Responsibilities:

Analyze & Understanding new VMM services of Business partner and Brand data. The existing ETL code end to end process.
Extracting the data using SQOOP tool and ingesting into Hive database for Initial data loads.
Worked on the Sqoop, Hive, HDFS file system for Hadoop archival from Teradata. Experience in using Sequence files, RCFile and ORC file formats.
Created and worked Sqoop jobs with incremental load to populate Hive External and Managed tables.
Design & Develop the wrapper scripts, Oozie workflows and Oozie Co-ordinators.
Writing UDF’s depending on the specific requirement. Manage and review Hadooplog files, File system management and monitoring.
Implementation & Post Implementation Support activities in different Hadoop environments.
Design the Oozie Co-Ordinator flow for Business Partner and Brand flow execution.
Executable code deployment to multiple Servers using CICD process like code versioning tools GitHub and Jenkins.
Created the Hive Partition Internal tables, created and used the TDCH utility for data ingestion from Hive to Teradata.

Confidential

Associate Technical Architect

Environment: IBM InfoSphere DataStage (version 8.5), Teradata, UNIX/Linux, Flat Files, Control-M, QC, HPSD tool, Query Surge, Erwin, Jira

Responsibilities:

Understanding the existing architecture and flow dependency of Merchant Analytics Application.
Working on fixing the production issues and defects.
Exploring tuning opportunities in ETL & DB for better response time.
Developing high level design documents and templates to follow a specific design approach.
Involved in development activities to design and develop DataStage jobs.
Leading 6-member offshore team with development Execution and Implementation activities.
Involved in fixes for bugs identified during production runs within the existing functional requirements.

Confidential

Associate Technical Architect

Environment: IBM InfoSphere DataStage (version 8.5), Teradata, UNIX/Linux, Flat Files, Control-M, QC, HPSD tool, Query Surge, Erwin

Responsibilities:

Understanding the existing architecture and flow dependency, exploring opportunities to avoid duplicate dependency.
Exploring tuning opportunities in ETL & database for better response time and Performance.
Performed advanced analysis to profile and asses source system data.
Developing high level design documents and templates to follow a specific design approach.
Involved in development activities to design and develop DataStage jobs and sequencers.
Used DataStage Designer for developing various jobs to extract, cleansing, transforming, integrating and loading data into Warehouse.
Analyzing the failure of jobs and finding the root cause for the failure & long execution of jobs.
Created UNIX scripts to run the DataStage jobs and ftp the files.
Mentoring a 5-member offshore team.

Confidential

Associate Technical Architect

Environment: IBM InfoSphere DataStage (version 8.5), Teradata, UNIX/Linux, Flat Files, Control-M, QC, HPSD tool Erwin

Responsibilities:

Analysis of existing data model to understand its shortcomings and to provide input during the data modelling phase.
Provide solution for the Customer Service Requests and ad-hoc cases, requiring application related attention.
Analysis of data models for cross-verification, data profiling for cross-domain integration.
Creation of high level design definition and reviews. Co-ordination with multiple stakeholders for effective SIT/UAT
Capacity planning for development and production. Mentoring a 5-member offshore team.
Extensively used the best partition techniques like Hash, Same, Modulus, Round Robin to improve the performance of the jobs.
Co-ordination with multiple stakeholders for effective SIT/UAT
Capacity planning for development and production.
Involved in fixes for bugs identified during production runs within the existing functional requirements.

Confidential

Associate Technical Architect

Environment: IBM InfoSphere DataStage (version 8.7), DB2, Teradata, UNIX/Linux, Flat Files, Control-M, QC, Erwin

Responsibilities:

Involved in analysing business requirements and design documents.
Performed Column Analysis, Primary Key Analysis, Foreign Key Analysis and Cross Domain Analysis using IBM Information Analyzer.
Designed and developed jobs using DataStage/Quality Stage 8.7 for loading the data into Dimension and Fact Tables.
Provide solution for the Customer Service Requests and ad-hoc cases, requiring application related attention.
Developed DataStage jobs using Change Capture Stage to compare the data changes between source tables and DataMart tables.
Implemented slowly changing dimensions using SCD stage.
Extensively involved in developing data cleansing procedures using Quality Stage to standardize names, addresses and area.
Creation of Detail level design definition and reviews.
Co-ordination with multiple stakeholders for effective SIT/UAT.
Capacity planning for development and production.
Mentoring a 4-member offshore team.

Confidential

Senior ETL Developer

Environment: IBM InfoSphere DataStage (version 8.5), DB2, Teradata, UNIX/Linux, Flat Files, Control-M, QC, Erwin Data Model

Responsibilities:

Involved in bug fix & support during production & post production implementation.
Performed advanced analysis to profile and asses source system data.
Involved in development to design and develop jobs & sequencers.
Performed code unit testing by stimulating various business cases.
Developing high level design documents and templates to follow a specific design approach.
Used Best Partition techniques to handle & load the huge volume of data.
Created the DataStage code for developing various jobs to extract, cleansing, transforming, integrating & loading data into Data Warehouse.
Worked with business users to Analyze & create the requirement documents and converted the requirements to design documents.
Preparing solution and error logging database which helps in Problem, Incident Management.

Confidential

Senior ETL Developer

Environment: IBM InfoSphere DataStage 7.5, IBM Information Analyzer, DB2, UNIX/Linux, Flat Files, Erwin, Shell Scripting, QC

Responsibilities:

Performed Column Analysis, Primary Key Analysis, Foreign Key Analysis and Cross Domain Analysis using IBM Information Analyzer.
Designed and developed jobs using DataStage for loading the data into Dimension and Fact Tables.
Provide solution for the Customer Service Requests and ad-hoc cases, requiring application related attention.
Developed DataStage jobs using Change Capture Stage to compare the data changes between source tables and DataMart tables.
Used Best Partition techniques to handle & load the huge volume of data.
Responsible for preparing comprehensive test plans and thorough testing of the system keeping the business users involved in the UAT.
Involved in Unit Testing and Integration Testing. Involved in Tuning and Performance Testing.
Involved in fixes for bugs identified during production runs within the existing functional requirements.

Confidential

ETL Developer

Environment: IBM InfoSphere DataStage (Version 7.5), Oracle 10G, IBM DB2, UNIX/Linux, Control M, QC, Erwin Data Model

Responsibilities:

Involved in understanding the scope of application, present schema, data model and defining relationship within and between the groups of data.
Used the Data Stage 7.5 extensively to transfer and load data into the staging area and eventually into Warehouse.
Used quality extensively for Data profiling.
Used Best Partition techniques to handle & load the huge volume of data.
Designed Mappings between sources to operational staging targets, using Star Schema, Implemented logic to Slowly Changing Dimensions.
Created Schema file pattern to read multiple files with one ETL job.
Identified the control M details and verify stage sync with Prod Control-M
Turnover the artifacts and knowledge transitioning to EAM/support teams.

Confidential

ETL Developer

Environment: IBM WebSphere DataStage 7.5.x, Oracle 9i, DB2 UDB, Control M, QC, Erwin

Responsibilities:

Developed transformations in the process of building DWH database.
Used parallel jobs for Extraction, Transformation and Load.
Analyzed and designed source code documentation for investment DWH.
Used DataStage Designer for developing various jobs to extract, cleansing, transforming, integrating & loading data into Data Warehouse.
Generating of unique keys for composite attributes while loading the data into Data Warehouse.
Used Best Partition techniques to handle & load the huge volume of data.
Extensively worked with Data Stage Job Sequences to Control & Execute Data Stage Jobs and Job Sequences using various Activities and Triggers.
Used DataStage Director and the runtime engine to schedule running the jobs, monitoring scheduling and validating its components.
Used Hashed file to extract & write data and act as intermediate file in a job. Hashed file is also used as reference table based on single key field.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship