We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Around 8 years of Software Life Cycle experience in System Analysis, Design, Development, Implementation, Maintenance, and Production support of Data Warehouse Applications.
  • Over 5+ years of experience in end - to-end designing and deploying utilizing Talend Studio (5.x/6.x/7.x ) for Data Acquisition
  • Strong understanding of project life cycle and SDLC methodologies including Waterfall and Agile.
  • Expertise in understanding and supporting the client with project planning, project definition, requirements definition, analysis, design, testing, system documentation and user training.
  • Broad design, development and testing experience with Talend Open Studio, Talend Data quality, Talend ESB and knowledge in Performance Tuning of mappings
  • Solid understanding of Hadoop ecosystem like HDFS, Hive, Hbase Impala, Sqoop, columnar storage Oozie, parquet files
  • Strong knowledge of Entity-Relationship concept, Facts and dimensions tables, slowly changing dimensions (SCD) and Dimensional Modeling (Kimball/Inman), Star Schema and Snow Flake schema
  • Experience in Google cloud ecosystem like bigquery, bigtable, cloudproc, dialogflow,cloud storage and IAM policies.
  • Good understanding Amazon web services (AWS) ecosystem like redshift, S3
  • Experience in the integration of various data sources such as Oracle,my SQL, Postgres SQL, SQL Server, Salesforce cloud, Teradata, JSON, XML Files, Flat files and API integration.
  • Vast experience in Designing and developing complex mappings from varied transformation logic like Unconnected and Connected lookups, Source Qualifier, Router, Filter, Expression, Aggregator, Joiner, Update Strategy.
  • Experience in UNIX shell scripting, CRON, FTP and file management in various UNIX environments.
  • Knowledge in designing Dimensional models for Data Mart and Staging database.
  • Extensive experience in creating complex mappings in Talend using transformation and big data components .
  • Experience in working on Talend Administration Activities and Talend Data Integration ETL Tool
  • Hands on experience in tuning mappings, identifying and resolving performance bottlenecks in various levels like sources, targets, mappings, and sessions.
  • Expertise in defining and documenting ETL Process Flow, Job Execution Sequence, Job Scheduling and Alerting Mechanisms using command line utilities.
  • Extensive experience in implementing Error Handling, Auditing and Reconciliation and Balancing Mechanisms in ETL process.
  • Skilled in developing Test Plans, Creating and Executing Test Cases.
  • Excellent Analytical, Written and Communication skills.

TECHNICAL SKILLS

ETL Tools: Talend 6.3/6/4/7.1, Informatica 8.x

Databases (SQL/NoSQL): Oracle, Postgres SQL, MS SQL, Neo4J, Hbase,Big query, Hive, LUDP,MongoDB,Cassandra

Data Modeling: ERWIN 4.5, Star Schema Modeling, Snow Flake Modeling

Programming: Python, Core Java

Big Data Tools: Google Cloud,AWS, Impala, sqoop, Apache Spark, Apache Kafka

Scheduling Tools: TAC, Airflow,Oozie

Other Tools: Putty, WINSCP, TOAD, TSA, Postman, Git, swagger, Jira,tableau

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Developing data pipelines as a part of advance analytics to create data lake product using Oozie and Impala
  • Working with cross functional partner to create POC’s and relevant tech stack requirements
  • Convert existing data warehouse logic into flattened parquet tables in impala
  • Designed ETL process for slowly changing dimensions to create data warehouse environment in postgres DB using Talend
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
  • Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
  • Involved in writing SQL Queries and used Joins to access data from Oracle, and Postgres SQL.
  • Used tStatsCatcher, tDie, tLogRow to create a generic job let to store processing stats.
  • Solid experience in implementing complex business rules by creating re-usable transformations and robust mappings using Talend transformations like tConvert Type, tSort Row, tReplace, tAggregate Row, tUnite etc.
  • Developed Talend log process for tracking tables and subjobs for each caterogies
  • Used tStatsCatcher, tDie, tLogRow to create a generic job let to store processing stats into a Database table to record job history. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines.
  • Created AWS environment to test S3, EMR,Cloudwatch,EC2 and Lambda for one business group and demonstrated various issues to product management team
  • Used tRunJob component to run child job from a parent job and to pass parameters from parent to child job.
  • Experienced in writing expressions with in tmap as per the business need.
  • Handled insert and update Strategy using tmap. Used ETL methodologies and best practices to create Talend ETL jobs.
  • Extracted data from flat files/ databases applied business logic to load them in the staging database as well as flat files.

Environment: Talend DI 7.1, Oracle, Postgres, HDFS, Hive Impala, java, JIRA, Agile Methodology.

Confidential

Data Engineer/BI Developer

Responsibilities:

  • Worked with ecommerce, Marketing salesforce, Sales & Ops, Manufacturing, customer support team to implement projects like Data warehouse, Data Egineering, Data integration automation, process design, API enablement, Analytics, Data quality
  • Solid experience in implementing complex business rules by creating re-usable transformations and robust mappings using Talend transformations like tConvertType, tSortRow, tReplace, tAggregateRow, tUnite etc.
  • Involved in analyzing system failures, identifying root causes and recommended course of actions.
  • Worked on Hive, Big query (BQ) for exporting data for further analysis and for generating transforming files from different analytical formats to text files.
  • Experience in Extraction, Transformation and Loading of Data from different Heterogeneous Origin systems likes Complex JSON, XML, Flat Files, Excel, Oracle, MySQL and SQL Server, Sales force Cloud, API endpoint.
  • Handled importing of data in Talend from various data sources, performed transformations using Hive, Pig and Spark and loaded data into LUDP.
  • Imported data Big query to LUDP Hive table
  • Created POC for operational logs for GCP,AWS,LUDP using Cassandra
  • Worked with ecommerce Dev ops engineer to create an automation process for log Filtering using AWS S3,Gsutil and Splunk
  • Worked with mobile ecommerce team to build data mart on AWS palatform in Redshift
  • Worked on POC for Business Intelligence chat bot data modeling using neo4J, MongoDB and Bigtable.
  • Created python scripts to clean the data from the multiple tables
  • Created Hive External tables and loaded the data in to tables and query data using HQL
  • Developed Restful API’s for customer care system using Talend ESB which can be used to easily access customer and product data
  • Responsible for Data Modeling and Development of Internal Business Intelligence Chat Bot that provide real time access to business KPI’s using Python Flask and Google cloud
  • Improved daily jobs performance using data cleaning, query optimization and table partitioning
  • Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend and Bigquery
  • Designed a data ingestion process using Big Query & GCS to process 200 GB manufacturing daily test data
  • Created a mechanism to import third party vendor orders and distributor information data using API endpoint extraction
  • Create a process to extract email attachments and send required information from Big Query
  • Mapping source to target data and converted data JSON to XML (Accord Format) using Talend data mapper and transform with TXMLMap component.
  • Created execution plans in TAC
  • Created talend quality checks job lets based on business requirements
  • Created Talend Mappings to populate the data into dimensions and fact tables
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend.
  • Developed a process to analyze customers data to run Push notification promotion campaign which increase adaption rates by 28% using tableau
  • Identified deeply defective manufacturing stations and test process based on smart factory method and suggested Process changes which reduce cost by 31%

Environment: Talend Big Data Integration 6.3, 6.4,7.1, Data Quality, Talend ESB, Talend BigData Spark, Oracle 11g, Hive 0.13, HDFS, SQL Navigator, XML files, Flat files, JSON, Hadoop 2.4.1, JIRA, Postman, Agile Methodology.

Confidential, SFO, CA

Data Analyst

Responsibilities:

  • Responsible for the design, development, and administration of complex T-SQL queries (DDL / DML), Stored Procedures, Views& functions for transactional and analytical data structures
  • Identify and interpret trends and patterns in large and complex datasets and analyze trends in key metrics
  • Collaborate with team to identify data quality, metadata, and data profiling issues

Confidential

ETL Developer

Responsibilities:

  • Design and Implement ETL for data load from Source to target databases and for Fact and Slowly Changing Dimensions (SCD) Type1, Type 2, Type 3 to capture the changes.
  • Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
  • Participated in all phases of development life-cycle with extensive involvement in the definition and design meetings, functional and technical walkthroughs.
  • Designing, developing and deploying end-to-end Data Integration solution.
  • Implemented custom error handling in Talend jobs and worked on different methods of logging.
  • Develop the ETL mappings for XML, .CSV, .TXT sources and loading the data from these sources into relational tables with Talend ETL Developed Joblets for reusability and to improve performance.
  • Created UNIX script to automate the process for long running jobs and failure jobs status reporting.
  • Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings
  • Expertise in interaction with end-users and functional analysts to identify and develop Business Requirement Documents (BRD) and Functional Specification documents (FSD).
  • Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
  • Created context variables and groups to run Talend jobs against different environments.
  • Used Talend components tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator.
  • Created Data model, Data entities and view for the Master Data Management
  • Involved in creating roles and access control
  • Created event management in which it listens continuously for the events in the MDM hub
  • Used the triggers to launch the process with given set of conditions
  • Worked on creating Data model, Campaigns for the Talend Data stewardship
  • Created Data entities in the Data model and defined roles at the entity level
  • Deployed the Data model, Data entities and View for the Talend MDM
  • Performed requirement gathering, Data Analysis using Data Profiling scripts, Data Quality (DQ) scripts and unit testing of ETL jobs.
  • Created triggers for a Talend job to run automatically on server.
  • Installation of Talend Enterprise Studio (Windows, UNIX) and configuring along with Java.
  • Set up and manage transactional log shipping, SQL server Mirroring, Fail over clustering and replication.
  • Worked on AMC tables (Error Logging tables)

Environment: Talend Platform for Data management 5.6, UNIX Scripting, Toad, Oracle 10g, SQL Server

We'd love your feedback!