Big Data Engineer Resume

SUMMARY

13+ years in various techno - functional landscapes as Big Data Engineer (AWS, Cloudera), Informatica SME, System integration, Legacy modernization, Analytics, Data Modeler, information management, Data Warehousing, Data Lake, Business Intelligence and ERP Systems analysis and Production support etc., in Healthcare, Insurance, Aviation and Finance verticals.
Extensive experience in Designing and Developing of Extract Transform and Load (ETL) processes. Extensive experience of Production Support and maintenance of Data Ware House Business Applications.
Extensive experience in Analysis and Designing ETL Solutions.
Provided Architectural Road Map, direction and work packets for ETL needs.
Created detail ETL Standards documents for Design, Development, Release Management and Production Support
Diversified experience in implementing solutions to Data Warehousing and Big Data Projects.
Hands on data Analyses experience in User Requirement Gathering, Data Cleansing, Data Transformation, Data Profiling, Source System analyses and Reporting Analyses.
StrongETL design skills in developing CDC (Change Data Capture) process for historical, daily & monthly loads.
Build the process to migrate the data from SAP, People Soft, TM1 (Budgeting Tool), PLM to Central Data Hub (CDH) in order to deliver standard and ahoc reports.
Hands on experience in AWS cloud services such as EC2, EMR, RDS and Redshift.
Hands on Experience in MapReduce programming with Hadoop Distributed File System (HDFS) and with processing large data stores.
Strong ETL experience using Alteryx, Informatica, SSIS, DataStage, and Pentaho Kettle.
Involved in designing and implementation of different data warehouse dimensional models like Star Schema, Snow Flake Schema.
Drive the Workshops, Drive the discussions with clientsand Lead issues to solutions.
Leading a team of offshore and onshore resources for the successful completion of project.
Have exceptional background in analysis, design, development, testing and implementation of data warehouse applications.
Develop, coordinate and manage projects and project plans for development; develop proposals and delivery tasks/issues
Mentor team membersandprovide guidance to other associates
Strong hands on experience in performance tuning of Informatica, SSIS and Data Stage.
Strong hands on production Support Experience.
Experience with Hadoop ecosystem tools by usingHDFS, Map reduce, Hive, Impala, Pig,Sqoop, in developing and designing enterprise level disturbed applications.
SCD Management including Type 1, 2, 3, De-normalization, Cleansing, Conversion, Aggregation, Performance Optimization etc.
Experience in Migrating the code from Development to Test and to Production (deployments).
Experience in Preparation of Low level Designs (LLD), UTR, Run Book, Impact Analysis and Technical Design Document.
Proficient in creating UNIX Shell Scripts.
Experience in importing data using Sqoop from Relational Database Systems and vice-versa to HDFS.
Very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.
Extensively created and used various Teradata Set Tables, Multi-Set table, global tables, volatile tables, temp tables.
Experience in Migrating the code from Development to Test and to Production (deployments).
Hands on Experience in Preparation of Low level Designs (LLD), UTR, Run Book, Impact Analysis and Technical Design Document.
Designed and developed complex Mappingsfrom various transformations like Source qualifier Aggregator, Expression, Connected & Unconnected lookup, Filter, Joiner, Sequence generator, Sorter, Router, Normalizer and Update Strategy.

TECHNICAL SKILLS

ETL & Reporting Tools: Informatica PowerCenter: 10X9.X,8.X, Power Exchange, AXON, EDC, BDE, SSIS, Pentaho KettleHadoop Ecosystem, Hive, Impala, Sqoop, Pig, Map reduce, HDFS,AWS, DataStage, EC2, AWS, Cloudera

Tools: SQL Developer, TOAD, Microsoft Office, Sqlassistant for Tera Data, Tera Data Utilities Btech, Fast Load, TPT

BI Tools: Micro Strategy, Tableau

Programming Languages: SQL, PL/SQL, T-SQL, Unix Shell Scripting

RDBMS: Oracle, SQL Server, Teradata, Natezza, Posgress

Operating Systems: MS-DOS, Windows,, Linux, EC2, AWS, Cloudera

Scheduling Tools: CA7, Autosys, Control Confidential, Informatica Scheduler

Data Modeling Tool: ERWIN

PROFESSIONAL EXPERIENCE

Confidential

Big Data Engineer

Responsibilities:

Infrastructure Architects leads to design and review processes for new systems.
Develop document and proposed technical design for the integration and implementation of any new software, working across the IT department
Manage and implement data processes (Data quality reports).
Design the architect to pull the data from on prem list to HDFS.
Build SFTP process to pull the data from Approved Kaiser outside vendors.
Data integration in powerexchange for SAP Netweaver.
Develop data profiling, deduping logic, matching logic for analyses
Programming languages experience in Python, Pyspark and Spark for data ingestion.

Environment: Informatica 10x, Informatica Axon,Informatica PowerExchange, Informatica Catalogue (EDC), IDQ, Big Data Manager (BDM), AWS, S3, EC2, Redshift, EC2, EMR, RDS, Hive, Impala, Spark, Sqoop, Tableau, Oracle, Teradata, Cloudera HDFS. SAS EG

Confidential

Senior Big Data Engineer

Responsibilities:

Responsible for building, maintaining data pipelines and data products to ingest, process large volume of structured / unstructured data from various sources.
Analyzing the data needs, migrating the data into an Enterprise data lake, build data products and reports.
Build real time and batch based ETL pipelines with strong understanding of big data technologies and distributed processing frameworks
Strong understanding of the big data cluster, and its architectureo Experience building and optimizing big data ETL pipelines.
Advanced programming skills with Python and Scala.
Develop and implement Informatica Data Quality (IDC) and Enterprise Data Catalog (EDC).
Develop IDQ reports including but not limited to:
Advanced transformation in mappings
Data standardization
Data cleansing
Data standardization, data cleansing and enriching to resolve the completeness, conformity consistency, and accuracy
Good knowledge of spark internals and performance tuning of spark jobs.
Strong SQL skills and is comfortable operating with relational data models and structure.
Capable of accessing data via a variety of API/RESTful services.
Experience with messaging systems like Kafka.
Expertise with Continuous Integration/Continuous Delivery workflows and supporting applications.o Exposure to cloud environments and architectures. (preferably Azure)
Ability to work collaboratively with other teams. Experience with containerization using tools such as Docker.
Strong knowledge of Linux and Bash.
Design and develop ETL workflows to migrate data from varied data sources including SQL Server, Netezza, Kafka etc. in batch and real-time.
Develop checks and balances to ensure integrity of the ingested data.
Design and Develop Spark jobs as per requirements for data processing needs.
Work with Analysts and Data Scientists to assist them in building scalable data products.
Designs systems, alerts, and dashboards to monitor data products in production.

Environment: Informatica 10x, Informatica Axon, Informatica Catalogue (EDC), IDQ, Big Data Manager (BDM), AWS, S3, EC2, Redshift, EC2, EMR, RDS, Hive, Impala, Spark, Sqoop, Tableau, Oracle, Teradata, Cloudera HDFS.

Confidential

Data Integration SME | Senior ETL Developer

Responsibilities:

Build the CDC process by using Informatica Power Centre.
Built the CDC mappings by using informatica.
Build dimensional model for the administrative data.
Developed glossary by using Enterprise Data Catalog and Axon Informatica tools.
Import data using Sqoop / HDFS commands from Relational Database Systems and vice-versa to HDFS.
Import the data into Hive tables by using Big Data Manager (BDM) and also built the ETL process by using BDM from on prem Oracle DB to S3 bucket and from S3 to Redshift.
Build the data glossary and system inventory by using Informatica Axon.
Build the Data Dictionary by using enterprise catalogue.

Environment: Informatica 10x, Informatica Axon, Informatica Catalogue (EDC), IDQ, Big Data Manager (BDM), AWS,S3, EC2, Redshift, EC2, EMR, RDS, Hive, Impala, Spark, Sqoop, Tableau, Oracle, Teradata, Cloudera HDFS.

Confidential

Lead ETL Architect / Data Modeler

Responsibilities:

Developed and support ETL code for data warehouse and data marts to support the reporting and data analytic systems.
Responsible to Develop test plans, perform unit and integration testing.
Build the process to migrate the data from SAP, People Soft, TM1 (Budgeting Tool), PLM to Central Data Hub (CDH) in order to deliver standard and ahoc reports.
Responsible to work with Data architecture team, design and develop ETL solutions, unit testing, code reviews, enhance standards, and performance tuning.
Responsible to Convert Data stage jobs into Informatica.
Responsible for Production support including immediate responsiveness during system down or system component down situations to analyze root cause and restore system functionality.
Responsible for Low level Designs (LLD), UTR, Run Book, Impact Analysis and Technical Design Document.
Responsible of navigation and file manipulation in a Unix/Linux environment, and building Unix scripts
Produced XML for lower environments.
Developed and document internal Data stage ETL standards and best practices.
Performed complex analysis and design across multiple database platforms and technologies.
Developed Conceptual and physical model on CDH for Budget and actuals.
Drive the workshops / Discussion with the clients.
Developed ETL sourcing strategy, Data Validation and data reconciliation process.
Developed process flows, Technical Design Document, Deployment Cutover Checklist.
Closely work with TM1, PLM, and People Soft teams to deliver standard reports, adhoc reports.
Conducted code review sessions with team to ensure code is as per the standards defined.
Provided production support.
Establish ETL frame work, enterprise naming standards and best practices.
Provided solutions for ETL designs
Co-ordination with delivery team to ensure high quality and timely delivery.
System performance optimization.
Involved in the Development and maintenance of DWH ETL Jobs.

Environment: Teradata, Oracle, Informatica Power Center, Powerexchange, Data Stage, Micro strategy, People Soft, TM1, PLM, Tableau, Unix, Share Point, JIRA, XML, Web services, Enterprise Management System

Confidential

Lead data architecture development

Responsibilities:

Coordinated with different downstream business users to understand their data requirements and designed models.
Responsible for ETL Production support including immediate responsiveness during system down or system component down situations to analyze root cause and restore system functionality.
Establish ETL frame work, enterprise naming standards and best practices.
Co-ordination with delivery team to ensure high quality and timely delivery.
Support production, support team and work on defects.
Conducted code review sessions with team to ensure code is as per the standards defined.
Involved in Unit Testing,Standalone Testing, Integration Testing, System Testing and End to End testing.
Responsible to Convert Data stage jobs to SSIS.
Created Audit balancing tables to make sure source records are loaded without any data loss.
Designing scripts, Flowcharts and data flow diagrams for Documentation of existing systems.
Involved in production monitoring.
CreatedETL Mapping and Graphs based on Technical Specification Document.
Created technical specification document based on FSD.
Established ETL frame work, enterprise naming standards and best practices.
Built ETL jobs for DE normalized tables.
Co-ordination with delivery team to ensure high quality and timely delivery.
System performance optimization.
Involved in the Development and maintenance of DWH ETL Jobs.
Involved in writing the use cases
Created Fast export/ bteq scripts for report generation to downstream Applications.
Used Fast load and Multi load utilities for loading staging area tables.
Automatic stage process by generating dynamic bteq script for loading stage to semantic layer based on columns declared in the audit tables.
Involved in production deployment planning and implementation.

Environment: Teradata, Oracle, Hadoop File System, Informatica Power Center, Hadoop, HDFS, Yarn, Hive, pig, oozie, Spark Unix Scripting, CONTROL Confidential, Query it, soap, Business Objects, Tableau,, IDQ, MDM, Oracle, Unix, Share Point, ALM, JIRA, XML, Web services

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship