We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

West Chester, PA

PROFESSIONAL SUMMARY:

  • Over 9+ years of experience in IT industry and researching with core competency in development as Data Engineer with technical skills of Hadoop(Hive,Pig),Spark2.x,PySpark, AWS,DynamoDB, S3, Glue, Athena, DATABRICKS (SparkEngine),Apache Kafka, Informatica Power Center 9.6/10.1,Informatica BDE 9.6/10.1,Informatica Analyst, Informatica Developer, Teradata 12/13/14/15,SQL,UNIX and other client tools and strong knowledge in data warehousing.
  • Worked on writing Spark code to process large data sets and perform transformations and load data to persistence areas in Amazon S3 buckets and on premise clusters
  • Create Data pipelines for Kafka cluster and process the data by using sprk streaming
  • Create Glue jobs in AWS and load incremental data to S3 staging area and persistence area.
  • Worked on streaming data to consume data from KAFKA topics and load the data to landing area for reporting in near real time
  • Load data to DynamoDB for job control and metadata regarding S3 staging and persistence jobs
  • Worked on writing code Golang to pull data from Kinesis and load data to Prometheus which will be used by Grafana for reporting and visualize
  • Create Notebooks in Data bricks to pull the data from S3 and process with the transformation rules and load back the data to persistence area in S3 in Apache Parquet format
  • Experience in writing Teradata BTEQ, Fast load, Multi load scripts and automated Unix scripts to deploy Teradata queries in production
  • Worked Extensively on Informatica, Teradata, and Hadoop Framework and UNIX scripts.
  • Worked extensively in implementing Hive and pig scripts. Loading data from OLAP/OLTP systems to HDFS .Creating tables in Hive and writing Pig scripts to do CDC logics to process data in Hadoop environment
  • Worked on implementing SCD types in BTEQ scripts and exception handling on the records which does not meet the rules according to business requirement.
  • Have extensively worked in developing ETL program for supporting Data Extraction, transformations and loading using Informatica Power Center
  • Understand the business rules completely based on High Level document specifications and implements the data transformation methodologies.
  • Strong ability to execute multiple Data Quality and Data Governance projects simultaneously and deliver within in timelines.
  • Extensively worked with Informatica performance tuning involving source level, target level and map level bottlenecks.
  • Working as production support Lead and support my team in case of any failures which needs to meet SLA’s.
  • Monitoring all the production jobs in Hadoop with related tools like Resource manager and admin interface Ambari
  • Experience in creating Mappings, Workflows, applications, Mapplets and Rules and using of rules in Profiling and executing the mappings and workflows through infacmd/pmcmd commands.
  • Worked in the Master Build team to create the SQL scripts and Kintana packages for the deployment in the weekly, monthly and quarterly releases.
  • Experience in Data Migration projects and worked extensively in designing the MDF tool which is Metadata Driven Framework.
  • Worked on designing Complex scheduling jobs using $U(Dollar U),UC4 and cron jobs
  • Experience in creating packages using kintana and source control like PVCS and GitHub.
  • Experience in implementing Excel Services and Dash boards for Reports.
  • Flourish in both independent and collaborative work environments with quick learning abilities and excellent communication skills, presentation skills.

AREAS OF EXPERTISE:

  • Design, Data Integration, Data Quality
  • Hadoop Framework(Hive, Pig, HDFS, Sqoop)
  • Spark 2.X,Pyspark
  • AWS
  • SQL/PLSQL
  • ETL (Extract Transform Load)

TECHNICAL SKILLS:

Programming Languages: SQL, Python

Operating Systems: UNIX, Windows

Tools: Erwin, TOAD, SQL Navigator, SQL Developer, Teradata SQL Assistant SQL Loader, ER Studio

Frame Works: Spark2.3,PySpark,Hadoop(Mapreduce, Pig, Hive,HDFS),Yarn, Ambari, Resource Manager, Sqoop,Kafka

Database: AWSDynamoDB, AWS S3(Storage),AWS Glue, Teradata V2R6,Teradata 12.0,Tearadata 13.0, Teradata 14.0, 15,Oracle, Prometheus

BI Tools: Cognos, Grafana

ETL Tools: Informatica Power center 9.6/10.1, Informatica IDQ 9.61/10.1,Informatica Analyst tool, Data Integration, Informatica Developer

Scripting Languages: Unix(Shell),Python

Scheduling Tools: UC4,$U, Event Engines, Cron jobs, Xgears

Repository Tools: PVCS, CVS. IDN Portal, GitHub

CHRONOLOGICAL SUMMARY OF EXPERIENCE:

Confidential, West Chester, PA

Sr. Data Engineer

Responsibilities:

  • Involved in requirements gathering on the data subjects and analysis of the source data before designing.
  • Profiling the data present in different sources and getting the stats of the data and send this report to business users to confirm on the data.
  • Start creating jobs using open Ingest framework which loads data to S3 Bucket in Parquet format.
  • Create Glue jobs to process the data from S3 staging area to S3 Persistence area.
  • Write Templates for Spark jobs by using Python(Pyspark)
  • Create Spark code to process streaming data from Kafka cluster and load the data to staging area for processing.
  • Create data pipelines to use for business reports and process streaming data by using Kafka on premise cluster.
  • Process the data from Kafka pipelines from topics and show the real time streaming in dashboards
  • Write the Golang code to to pull the data from kinesis and load the data to Prometheus which will be used by Grafana for reporting and visualize
  • ETL Development using Informatica, Teradata utilities such as Fast Load, Multi Load, Fast Export, TPump and BTEQ
  • Data Ingestion to Meld and data integration between HADOOP eco system to Teradata using SQOOP / PIG / Hive QL
  • Working in Tier - 3 and coordinating with Development team to resolve the PROD defects and UAT defects.
  • Code Migration to QA and UAT support and data load support after deployment of the changes to UAT or PROD environment.
  • Worked closely with Dev team and P1 team to resolve the issues if the jobs which has SLA’s fails.
  • Worked on Enhancements for any changes in business requirement and working closely with architects and offshore to implement the change.
  • Loading the data to Hadoop environment from different sources like Oracle and mysql.
  • Working on creating different tables in Hive to in corporate CDC logics by writing some Pig Scripts and HiveQl scripts which will perform CDC logics
  • Worked on creating the d data in the Analyst tool which will get from the approval from the Data Analysts.
  • Experience in end to end Data quality testing and support in enterprise warehouse environment.
  • Writing Teradata utility scripts BTEQ to load the data from one layer Stage to JRNL and from JRNL to Base.
  • Implemented SCD type -2 logic using BTEQ scripts and Fast load scripts.
  • Developing Informatica Sessions to schedule these BTEQ scripts of Teradata by using command option in the sessions of Teradata.
  • Utilizing the existing frameworks to load the data and to update the data which meets the specific business requirements.

Environment: Spark, Scala, Kafka, AWS(S3,Dynamo,Glue), Informatica PowerCenter, Oracle 11g/10g, Teradata 14, Toad, UNIX Shell Script, SQL Server Management Studio, GITHUB. Hadoop (Hive, Pig,HDFS)

Confidential, West Chester, PA

Hadoop Developer

Responsibilities:

  • Involved in analysis of the FEEDs from the mainframe system by taking the DML of the FEED.
  • Profiling the data present in Oracle by using the structure of the FEED and creating mapping documents of the DML for creating objects in Teradata Database.
  • Created the Profiling reports to be useful for the Informatica Analysts.
  • Data Ingestion to Meld and data integration between HADOOP eco system to Teradata using SQOOP / PIG / Hive QL
  • Write Spark code to run the data quality rules and load the data to Spartan Data base and also to hive in Hadoop cluster
  • Created Source and Target objects in PowerCenter Designer to load the data from one physical data object to other.
  • Worked on different transformations like Expression, Look up, Update Strategy, Sequence Generator, Source Qualifier, Union, Aggregator, Filter, Joiner, Rank, and Router Transformations.
  • Worked on creating the d data in the Analyst tool which will get from the approval from the Data Analysts.
  • Experience in end to end Data quality testing and support in enterprise warehouse environment
  • Experience in maintaining Data Quality, Data consistency and Data accuracy for Data Quality projects.
  • Developed design documents and mapping design documents for the code and developed ETLS
  • Changed already existing mappings and added new attributes.
  • Creating mapping documents to match the source and Target layouts and sending the mapping documents for the LDM and PDM to start over.
  • Work with Flat Files and relational sources Oracle and worked with Informatica Scheduler.
  • Worked on Push down optimization and calling UNIX scripts from the ETL tool.
  • Creating Grant and Revoke scripts to issue the permission on the tables.
  • Written BTEQ, Fast load, Multi load Teradata scripts to load the data from UNIX landing zone by creating jobs.
  • Created the Development and production deployment build scripts by using the IDN portal and CVS to check in and check out the code developed and unit tested.
  • Unit tested the migrated data which should match the source and Target.
  • Developed the scheduling jobs in Development and production environments.
  • Documented the whole development process and trained users.
  • Experience with maintaining, trouble shooting and system related problems.

Environment: Spark 2.x,Scala,AWS,Informatica PowerCenter 9.6.1/10.1, Oracle 11g/10g, Teradata 14, Toad, UNIX Shell Script, SQL Server Management Studio, CVS. Hadoop (Hive, HDFS)

Confidential

Sr. Informatica Developer/Teradata Developer

Responsibilities:

  • Involved in requirements gathering and client interaction.
  • Implemented Profiling by using Informatica IDQ before extracting the data from Oracle.
  • Created score cards for the profiled data which need to be analysed by the analysts.
  • Getting the approvals from the business users regarding the PDM and LDM models according to the requirement
  • Created Source and Target objects in Power Centre Designer to load the data from one physical data object to other.
  • Worked on different transformations like Expression, Look up, Update Strategy, Sequence Generator, Source Qualifier, Union, Aggregator, Filter, Joiner, Rank, and Router Transformations.
  • Creating the Mapplets and rules, applying those rules in the mappings and while profiling.
  • Exporting the profiled and send it to the business analysts for the approval of data models.
  • Work with Flat Files and relational sources Oracle and worked with Informatica Scheduler.
  • Worked on Push down optimization and calling UNIX scripts from the ETL tool.
  • Worked with different designer tools like Source Analyser, Target Designer, Transformation Developer, Mapping Designer and Mapplets Designer.
  • Written BTEQ, Fast Load and Multi Load scripts to load data from UNIX landing zone.
  • Developed metadata using the MDF tool and loading the data to MySQL data base wrote complex transformation logics which involves look up transformation, Sequence Generator, Source Qualifier, Router Transformation, Filter Expression
  • Worked on data cleansing and standardization using the cleanse functions in Informatica MDM.
  • Created the Development and production deployment build scripts
  • Created Map reduce programs to clean and write transformations on the target side of HDFS
  • Written Pig and Hive scripts to analyse huge data
  • Developed multiple workflows for scheduling the jobs.
  • Created cron jobs and scheduling the map reduce and loading UNIX scripts on client requirement.
  • Developed the scheduling jobs in Development and productions environments.
  • Created the packages for development and production deployment by kintana.
  • Documented the whole development process and trained users.
  • Experience with maintaining, trouble shooting and system related problems.

Environment: Teradata 14,Informatica PowerCenter 9.6/10.1, Dollar Universe, ER/Studio Enterprise 8.5, WinSCP 5.1.4, Putty, LINUX, Oracle 11g/10g, Toad10.5, SQL Developer, Unix, CVS, Hadoop (Hive, HDFS)

Confidential

Sr.Informatica IDQ Developer/Hadoop Developer

Responsibilities:

  • Involved in requirements gathering and client interaction.
  • Implemented Profiling by using Informatica IDQ before extracting the data from Oracle.
  • Created score cards for the profiled data which need to be analysed by the analysts.
  • Getting the approvals from the business users regarding the PDM and LDM models according to the requirement
  • Creating the Mapplets and rules, applying those rules in the mappings and while profiling.
  • Exporting the profiled and send it to the business analysts for the approval of data models.
  • Working with different strategies on removing of duplicate data by using the match transformation, consolidated transformation and key generator transformation.
  • Creating data by the data given by the business analysts.
  • Creating mappings and loading the data to Teradata environment by using those mappings.
  • Worked with address validator on different templates to validate the address and checked for mail ability score and match score.
  • Work with Flat Files and relational sources Oracle and worked with Informatica Scheduler.
  • Worked on Push down optimization and calling UNIX scripts from the ETL tool.
  • Worked with different designer tools like Source Analyser, Target Designer, Transformation Developer, Mapping Designer and Mapplets Designer.
  • Involved in creating Tasks, Sessions, Workflows and Work let by using the Workflow Manager tools like Task Developer, Work let and Workflow Designer.
  • Written BTEQ, Fast Load and Multi Load, TPUMP Teradata scripts to load data from UNIX landing zone.
  • Developed metadata using the MDF tool and loading the data to MySQL data base wrote complex transformation logics which involves look up transformation, Sequence Generator, Source Qualifier, Router Transformation, Filter Expression
  • Worked on data cleansing and standardization using the cleanse functions in Informatica MDM.
  • Created the Development and production deployment build scripts
  • Created Map reduce programs to clean and write transformations on the target side of HDFS
  • Written Pig and Hive scripts to analyse huge data
  • Developed multiple workflows for scheduling the jobs.
  • Created cron jobs and scheduling the map reduce and loading UNIX scripts on client requirement.
  • Developed the scheduling jobs in Development and productions environments.
  • Created the packages for development and production deployment by kintana.
  • Documented the whole development process and trained users.

Environment: HDFS, Map reduce, Pig, Hive, Teradata, Abnitio, MDF, Engine (E1, E2, E3-Scheduling tools), UNIX, Informatica IDQ, Informatica BDE, Informatica analyst, Informatica PowerCenter 9.6

Confidential

Sr.Informatica Developer

Responsibilities:

  • Done the requirement analysis and data profiling for the source side(ORACLE) according to the requirement
  • Created score cards for the profiled data which need to be analysed by the analysts.
  • Getting the approvals from the business users regarding the PDM and LDM models according to the requirement
  • Cleansing the data using Standardiser, Parser, Case Converter, Key Generator, Expression and Filter transformations.
  • Creating the Mapplets and rules, applying those rules in the mappings and while profiling.
  • Exporting the profiled and send it to the business analysts for the approval of data models.
  • Working with different strategies on removing of duplicate data by using the match transformation, consolidated transformation and key generator transformation.
  • Creating data by the data given by the business analysts.
  • Creating mappings and loading the data to Teradata environment by using those mappings.
  • Worked with address validator on different templates to validate the address and checked for mail ability score and match score
  • Creating and maintaining database objects like tables, views, materialized views, indexes, sequences, synonyms in Teradata and loading data using the utilities
  • Developed design documents and mapping design documents for the code and developed ETLS
  • Changed already existing mappings and added new attributes.
  • Wrote PL/SQL Procedures, Triggers according to the requirement
  • Involved in writing the automated Unix scripts
  • Created the Development and production deployment build scripts
  • Created Map reduce programs to clean and write transformations on the target side of HDFS
  • Written Pig and Hive scripts to analyse huge data

Environment: HDFS, MapReduce, Pig, Hive, Teradata 12, PL/SQL, Unix, Informatica PowerCenter 9.6, $U (Scheduling tool),PVCS(Check-in tools)

Confidential

Informatica Developer

Responsibilities:

  • Involved in development of the required ETLS for the data flow from some ERPs and Oracle data base into target HDFS Environment and Teradata database and Metadata in MySQL to load the data to HDFS by using different transformations like SQ,Router, expression, aggregator, lookup, and other transformations by Informatica
  • Creating and maintaining database objects like tables, views, materialized views, indexes, sequences, synonyms in Teradata and developing UNIX Shell Scripts for loading data using the utilities
  • Involved in development of Business logic(TV logic ) SQL query that used to load the history and current data into the Target Teradata database tables
  • Involved in the development of all the scripts regarding the table structures and views and the corresponding business views
  • Involved in migrating the developed code from development environment to the testing environment
  • Wrote PL/SQL Procedures, Triggers according to the requirement
  • Involved in basic testing by coordinating with the QA folks
  • Involved in writing the automated Unix scripts
  • Created the Development and production deployment build scripts
  • Created Map reduce programs to clean and write transformations on the target side of HDFS
  • Written Pig and Hive scripts to analyse huge data

Environment: HDFS, MapReduce, Pig, Hive, Teradata 12, PL/SQL, Unix, Informatica PowerCenter, Kintana, $U Scheduling tool

Confidential

Teradata Developer

Responsibilities:

  • Involved in development of the required ETLS for the data flow from some ERPs and Oracle data base into target Teradata Database and Metadata in MySQL to load the data to HDFS by using different transformations like SQ,router,expression,aggrerator,lookup,and other transformations by Informatica creating and maintaining database objects like tables, views, materialized views, indexes, sequences, synonyms in Teradata and developing UNIX Shell Scripts for loading data using the utilities
  • Involved in development of Business logic(TV logic ) SQL query that used to load the history and current data into the Target Teradata database tables
  • Involved in the development of all the scripts regarding the table structures and views and the corresponding business views
  • Involved in migrating the developed code from development environment to the testing environment
  • Wrote PL/SQL Procedures, Triggers according to the requirement
  • Involved in basic testing by coordinating with the QA folks
  • Created the Development and production deployment build scripts
  • Created Map reduce programs to clean and write transformations on the target side of HDFS
  • Written Pig and Hive scripts to analyse huge data

Environment: HDFS, Map reduce, Pig, Hive, Teradata 12, PL/SQL, UNIX, Informatica PowerCenter, Kintana, $U Scheduling tool

Confidential

Teradata Developer

Responsibilities:

  • Analysing the required table structure and preparing the corresponding objects for that.
  • Involved in preparing test scripts covering all aspects of the business process.
  • Involved in development of ETLS to load the data from one stage to another stage.
  • Involved in end to end testing of all the ETLS parallel with QA folks.

Environment: Teradata 12, PL/ SQL Unix, Informatica Power Center 8.6.1, Kintana, $U Scheduling tool.

We'd love your feedback!