We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Overall 9.5 years of experience in IT, including 2 years of experience as Hadoop/Spark developer and 7+ years specializing in Data warehousing wif expertise in ETL tools like Informatica. In - person training in GCP.
  • Data Quality Experience includes designing Data Quality Framework on Hadoop cluster. Generating profiles, identifying anomalies, applying data quality rules, generating data quality trends and scorecards. Monitoring teh data quality trends and setting alerts.
  • Experience wif Data Quality Tools like Owl Analytics and IDQ.
  • Hands on experience in various big data application phases like data ingestion, data analytics and data visualization.
  • Understanding of Spark Architecture including Spark Core, Spark SQL, data frames, Spark Streaming.
  • Expertise in using Spark-SQL wif various data sources like JSON and HIVE.
  • Experience in creating tables, partitioning, loading and aggregating data using HIVE.
  • Good Knowledge of Data Science using Python wif expertise in Jupyter Notebook and Pycharm IDE.
  • Data Science knowledge includes Data Extraction, Data processing and visualization and creating Predictive models.
  • Good Understanding of Data Science Project Cycle and expertise in using Jupyter Notebook, Pycharm IDE, Cookiecutter template and versioning teh Data Science project using GitHub.
  • Good knowledge in exploring and processing data science project data using Pandas and NumPy.
  • Good Understanding of Data munging, feature engineering and advanced visualization.
  • Good understanding of Machine Learning Basics for building and evaluating Predictive models(Linear Regression Model and Logistic Regression model), fine tuning teh models, feature normalization and model persistence using NumPy, Pandas, Scikit-Learn, Pickle and Flask libraries from Python.
  • Expertise includes Data Loading, Data Analysis, Data Cleansing, Data Profiling, Data Standardization, Transformation and Data Integration using Informatica PowerCenter 10/9.x/8.x from various sources and targets including Oracle, DB2, XML, MqSeries, Webservices and Flat files.
  • Experience in System Analysis, Design, coding and testing of Data Warehousing implementations.
  • Experience utilizing Quality Center to monitor defects and testing Informatica code wif SQL.
  • Good experience wif Real Time Informatica (Webservices and Message Queues)
  • Extensive experience wif unstructured sources and targets.
  • Extensive experience in creating real time webservices for banking client.
  • Complete knowledge of data ware house methodologies Dimensional Modelling, Fact Table, ODS, EDW.
  • Worked extensively on Informatica’s Power Center Designer (Source Analyzer, Target Designer, Transformation Developer, Mapplet and Mapping Designer), Workflow Manager, Workflow Monitor, Repository Manager and Webservice Hub.
  • Extensive experience in RDBMS technologies using Oracle 11g/10g/9i, SQL Server, Toad 9.7, SQL*Plus and Database Querying using SQL,PL/SQL.
  • Identified and Populated Fact and Dimension tables according to Business Rules implementing Type 1 and Type 2 Slowly Changing Dimensions. Used different dimensional modeling techniques like Star and Snowflake Schema Modeling.
  • Experience working wif Agile Methodologies.
  • Experience in Data validations and Unit Testing, Integration Testing, UAT.
  • Extensive experience in debugging informatica maps using debugger.
  • Proficient in UNIX platform wif respect to Informatica ETL environment. Wrote UNIX Shell Scripts to enhance functionality in Fetching, Loading and Scheduling Source files from teh Source System, External and Internal Audit by generating Email Notification, importing and exporting data from DB2, Validation of input and output files, Success/Failure of Informatica jobs, Wrapper Scripts for informatica jobs.
  • Excellent skills in fine tuning teh ETL mappings in Informatica.

TECHNICAL SKILLS

DATA QUALITY TOOLS: Owl DQ Analytics, IDQ

ETL TOOL: Informatica Power Center 9.x/8.x

ETL SOURCES: Oracle, DB2, Flat Files, Unstructured Data, XML, Webservice, MqSeries

DB TOOLS: SQL*Plus, TOAD 9.7

TESTING TOOL: Rubymine 7.1.4

DEVELOPMENT LANGUAGES: Python, SQL, PL/SQL, Shell Scripting

DATABASES: Oracle 11g/10g/9i, DB2, Teradata

OPERATING SYSTEMS: Windows 98/2000/NT/XP/7/8, UNIX, Linux

DATA SCIENCE TOOLS: Jupyter Notebook, PyCharm IDE

DATA SCIENCE TEMPLATE: Cookiecutter

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Designing Data Quality Framework.
  • Building Data Quality Ingestion Pipelines.
  • Identifying Data Quality issues in Enterprise Data.
  • Working wif Owl Analytics tool which creates Spark Jobs.
  • Creating profiles for hive tables in Hadoop Cluster.
  • Identifying anomalies in teh datasets.
  • Creating and applying Data Quality Rules.
  • Generating trends and scorecards for each dataset in EDP Platform.
  • Monitoring teh Data Quality trends and creating Alerts System.
  • Fine Tuning profiling jobs(Spark Jobs) for huge datasets.
  • Shell scripting to create wrapper script for scheduling in Tidal and to carry out automated assessments of hive tables.
  • Planning and conducting a wide range of quality control tests and analyses to ensure all application products and services meet organizational standards and end user requirements.
  • Working in Agile Methodologies.
  • Used Pandas data frames and NumPy for exploring teh data and creating train and test data.
  • Used Linear Regression and Logistic regression models for training teh data.
  • Used Scikit-Learn, Pickle and Flask libraries from Python for feature normalization, model persistence and fine tuning teh predictive models.

Environment: Jupyter Notebook, Pycharm IDE, Cookiecutter template, GitHub

Confidential

Spark Developer

Responsibilities:

  • Used Spark-SQL to load JSON data and created schema RDD and loaded it into HIVE tables.
  • Developing Spark programs using PySpark for faster processing and testing of data.
  • Imported Data from different sources into Spark RDD.
  • Developed Spark Jobs to parse teh JSON or XML data.
  • Creating an extract (XML) of Producers information using PySpark.
  • Analyzing user requirements and defining testing specifications
  • Interacting wif Business Managers to collect teh Business requirements.
  • Planning and conducting a wide range of quality control tests and analyses to ensure all application products and services meet organizational standards and end user requirements.
  • Working in Agile Methodologies.
  • Validating Production Data. Creating Production Defects and tracking them in Quality Center.
  • Formulating and defining system scopes and objectives through research and fact finding combined wif an understanding of teh applicable data warehouses and business systems.
  • Mentor for teh team in guiding wif development challenges, testing and production issues.
  • Handling teh monthly release activities and providing confirmation to teh business regarding teh quality of teh data loaded.
  • Provided strategic support in development of detailed project plans, work assignments, target dates etc.

Environment: PySpark,, HIVE, SCOOP, DB2, SQL, Unix(Shell Scripting)

Confidential

Technology Lead

Responsibilities:

  • Interacted wif Business analysts to collect teh Business requirements and understand teh Usage of teh project.
  • Actively involved in creating requirement analysis documents
  • Actively involved in creation of low level design documents.
  • Developed Mappings/Workflows.
  • Worked on Message Queues and Webservices.
  • Worked on informatica Web Service Hub.
  • Worked on MqSeries and Webservices sources and targets.
  • Used most of teh transformations such as teh Source qualifier, XML Parser, XML Generator, Router, Filter, Sequence Generator, Expression, SQL, Union, Normalizer, Joiner, Transaction Control, Lookup etc as per teh business requirement.
  • Used IDQ plans for standardization and matching customer data (name and addresses).
  • Used mapping parameters and variables.
  • Developed unix scripts for creating request files for informatica jobs, wrapper script, email notification, validation of target files.
  • Implemented performance tuning and optimization techniques.
  • Developed Coding Standards and Best Practices, Technical Specification Templates.
  • Actively involved in deploying teh informatica code to QA and production.
  • Provide central point for questions/response between technical resources and business area
  • Implemented Join conditions between tables at teh Database level.
  • Created ETL (Extract Transform and Load) specification documents based on teh business Requirements.
  • Mentor for teh team in guiding wif development challenges, Testing and production issues.
  • Handling teh monthly release activities on time to cope up teh work.
  • Developed Source to Target data mapping documents, inclusive of all teh transformation and business rules, logical and physical column names, data types, data flow diagrams used for ETL Design and development.
  • Created Source to Target (ETL) Mapping documents which included teh fields, data type, definitions from both source and target systems. A business/transformation rule was also documented based on teh business requirement or formatting needs of data.
  • Created Data Flow Diagram which depicts teh flow of data from teh source system to teh target system.
  • Provided strategic support in development of detailed project plans, work assignments, target dates etc.

Environment: Informatica, DB2, SQL, Toad, Unix(Shell Scripting)

Confidential

Sr ETL Developer

Responsibilities:

  • Developed Informatica Mappings to transform and load into Oracle.
  • Worked closely wif teh ETL Lead, Data Modeler, Business Analysts to understand business requirements, providing expert knowledge and solutions on Data Warehousing
  • Developed source-target mappings and documented teh same.
  • Used most of teh transformations such as teh Source qualifier, Router, Filter, Sequence Generator, Expression, Union, Joiner, Dyanamic Lookup etc as per teh business requirement.
  • Used Dyanamic Lookup to create gloden records after teh merge.
  • Workflow development based on teh order of data to be loaded.
  • Implementing performance tuning and optimization techniques.
  • Involved as teh Key player in design and delivering teh above project.
  • Interacted wif Business analysts to collect teh Business requirements and understand teh usage.
  • Involved in all phases of ETL i.e. from Source to target.
  • Developed Coding Standards and Best Practices, Technical Specification Templates, Source to Target Mapping Templates, Initial and Incremental load strategies.
  • Involved in designing teh data mart as per teh reporting requirements wif Type2 and Junk as dimension tables along wif fact tables.
  • Acted as a mentor for teh team and participated in code reviews and task assignments.
  • Took care of quality procedures for analysis, program specifications, exhaustive test plans, defect tracking, change procedures etc.
  • Managed and also actively worked on QA, UAT and Bug Fixing for teh entire set of ETL jobs.
  • Managed and also actively worked on day to day Production support activities.

Environment: Informatica,Oracle, SQL, Toad, Unix(Shell Scripting)

Confidential

Sr. ETL Developer

Responsibilities:

  • Worked closely wif teh ETL Lead, Data Modeler, Business Analysts to understand business requirements, providing expert knowledge and solutions on Data Warehousing
  • Developed source-target mappings and documented teh same.
  • Used most of teh transformations such as teh Source qualifier, Router, Filter, Sequence Generator, Expression, Union, Joiner, Dyanamic Lookup etc as per teh business requirement.
  • Used Dyanamic Lookup to create gloden records after teh merge.
  • Workflow development based on teh order of data to be loaded.
  • Implementing performance tuning and optimization techniques.
  • Involved as teh Key player in design and delivering teh above project.
  • Interacted wif Business analysts to collect teh Business requirements and understand teh usage.
  • Involved in all phases of ETL i.e. from Source to target.
  • Developed Coding Standards and Best Practices, Technical Specification Templates, Source to Target Mapping Templates, Initial and Incremental load strategies.
  • Involved in designing teh data mart as per teh reporting requirements wif Type2 and Junk as dimension tables along wif fact tables.
  • Developed unix scripts for sorting and splitting teh target files.
  • Acted as a mentor for teh team and participated in code reviews and task assignments.
  • Took care of quality procedures for analysis, program specifications, exhaustive test plans, defect tracking, change procedures etc.
  • Managed and also actively worked on QA, UAT and Bug Fixing for teh entire set of ETL jobs.
  • Managed and also actively worked on day to day Production support activities.

Environment: Informatica,Oracle, SQL, Toad, Unix(Shell Scripting

Confidential

ETL Developer

Responsibilities:

  • Monitoring activities for all teh Production related jobs.
  • Abend resolution and tracking.
  • Resolving teh issues on adhoc basis by running teh workflows through breakfix area in case of failures.
  • Handling teh daily and weekly status calls for teh project.
  • Handling meeting related to Production Handover and internal.
  • Monitoring by checking logs and load details
  • Resolving issues related to long running jobs
  • Implementing teh performance tuning techniques in case of long running jobs.
  • Handling teh weekly, monthly release activities.
  • Extensively work on triggers, Stored Procedures, Joins, sequences in SQL/PLSQL.
  • Worked wif DB team in order to modify any table and working in parallel for teh related ETL changes.
  • Coordinating wif different source teams during teh release schedules for a smooth run.
  • Analysis and Resolution of abends.
  • Implementing teh standard SLA procedures whenever required in teh case of any failures.
  • Ongoing Loads Support - Analyze aborts and resolve them on timing bases.
  • Adhoc job schedule analysis requests, Adhoc issue analysis requests, Load status reporting
  • Adhoc meetings wif Development team and Lights On team to resolve unexpected scheduling or data issues

Environment: Informatica, Teradata, Toad, Mainframes

We'd love your feedback!