We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY:

  • Sr. Data Engineer with more than 7 years of experience in the design and development of Analytics/ Big Data Applications using industry tools while working with fortune f irms. Well - rounded experience in ETL, Hadoop, Spark, Data Modelling & Data Visualization.
  • Well- aware of Big Data concepts l ike Hadoop, Map - Reduce, YARN, Spark, RDD, Hive, Pig, Big Query, Data Frames Datasets, Streaming; attaining a thorough understanding of Big Data Ecosystem.
  • Adept in Statistical Programming Languages (Python, R, Apache Spark, MATLAB), and well- versed in Scripting
  • Languages (Python, Java).
  • Adroit in writing Py Spark Jobs and Hive QL and writing in- house UNIX Shell Scripts for Hadoop & Big Data Development. Skillful in developing & processing BI Application Designs on Confidential, Tableau, and Power BI.
  • Proficient in Hive, Oracle, SQL Server, SQL, PL/ SQL, T- SQL, and managing significantly large databases.
  • Deft in Performance Tuning of Data Pipelines, Distributed Datasets, Databases, and SQL Query Performance.
  • Hands- on experience on AWS Cloud Services (VPC, EC2, S 3, RDS, Redshift, Data Pipeline, EMR, Dynamo DB, Lambda Kinesis, RDS, SNS, SQS).
  • Dexterous in working on Informatica Power Center Tool - Source Analyzer, Data Warehousing Designer, Mapping & Mapplet Designer, and Transformation Designer.
  • Have good Knowledge of No SQL databases l ike Dynamo DB. 3 + years experience in Project Management on Agile, Scrum. 5 + years experience in managing Migration to
  • DEV/ UAT/ PROD.
  • Well- versed in Data Modelling and in developing complex data using Unified Modelling Language (UML), ER Diagrams Conceptual/ Physical Diagrams, etc.
  • Efficient in Data Lake Building and OLAP Services.
  • Take responsibility for developing the data model(s) for the data warehouse and providing recommendations for all data strategies & scope processes.
  • Have a thorough understanding of common Machine Learning concepts (performed prior academic research in ML).
  • Keep track of SDLC met

PROFESSIONAL EXPERIENCE:

Confidential

DATA ENGINEER

Environment: AWS, Spark, Tableau, SQL, Excel, VBA, Oracle, SQL Server 2012, DB 2, Kafka, Python.

Responsibilities:

  • Developing Hadoop ETL solutions to transfer data to the data lake using big data tools l ike Sqoop, Hive, Spark, HDFS, Talend etc.Designing & developing Spark code using Python Programming Language & Spark SQL for high - speed data processing to meet critical business requirements.Preferable experience in using Kafka, Spark, ECS, Docker.Well- versed in using AWS Cloud Services: EC 2, EMR, RDS, Redshift, S 3Designing and developing ETL Processes in AWS Glue to migrate Campaign data from external sources l ike S 3, ORC/Parquet/Text Files into AWS Redshift.Configuring ECS using the docker image for python l ibraries needed for a pipeline.Implementing RDD/Datasets/Data Frame Transformations in Scala through Spark Context and Hive Context. Importing Python Libraries into the transformation logic to implement core functionality.Writing Spark- SQL and embedding the SQL in Python files to run in the Airflow Dag as a part of the pipeline. Using No SQL Database Amazon Dynamo DB to store data of reporting Application.Writing AWS Lambda Code in Python for nested JSON files, converting and consolidating the kinesis stream. Incorporating strong database development skills to Stored
  • Procedure, Query Language, Performance Optimization in RDBMS (DB 2), Cassandra, and Hadoop.Migrating Objects from Oracle, Hive and Redshift to Snowflake.Designing and constructing AWS Data Pipelines using various resources such as AWS API Gateway to receive responses from AWS Lambda and retrieve data from s 3 using lambda function & convert the response into JSON format using Database as Snowflake, Dynamo DB, AWS Lambda function and AWS S 3Constructing a state- of- the- art data lake on AWS using EMR, Spark, Kafka, Python.Partnering with DBT on delivery of Data definitions and aligning with Instance Data conversion team. Developing algorithms & scripts in Hadoop to import data from the source system and incorporating in HDFS (Hadoop Distributed File System) for staging purposes.Prepared Data Quality Scripts using SQL and Hive to validate successful data load and quality.
  • Framed various sorts of data visualizations using Python and Tableau.Developed Unix Shell scripts to perform Hadoop ETL functions l ike Sqoop, created external/internal Hive tables, initiated HQL scripts.Developed scripts in Hive to perform transformations on the data and loaded them to target systems to be used by the data analysts for reporting.Scheduled jobs through Apache Oozie by creating workflow & properties files and submitted jobs.Designed workflows with many sessions with the decision, assignment task, event wait, and event raise tasks, used Informatica scheduler to schedule jobs.Worked with Technical Program Management and Software Quality Assurance teams to develop means to measure and monitor overall project health.

Confidential

DATA ENGINEER

Environment: Informatica, Hadoop, Shell Scripting, AWS, Python, Sqoop, Hive, Oracle, Confidential, Tableau, PL/ SQL, UNIX.

Responsibilities:

  • Undertook extensive data analysis & collaborated with the client teams to develop data models. Worked as BI SME for converting business requirements into technical requirements and documents.Developed HQL scripts in Hive & Spark SQL to perform a transformation on relational data and Sqoop export data back to DB' s.Took responsibility for data extraction, aggregations, and consolidation of Adobe data within AWS Glue using Py Spark.Developed Unix Shell scripts to perform ELT operations on big data using functions l ike Sqoop, created external/internal Hive tables, initiated HQL scripts.Developed the
  • ETL/SQL code to load data from raw stages relational DB' s and Ingested data using Sqoop to Hadoop environment.Developed & executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.Optimized Spark code in Python by reengineering the DAG logic to use minimal resources and provide high throughput.Used Informatica file watch events to pole the FTP sites for the external mainframe files.Developed Py Spark scripts to transform unstructured and semi - structured streaming data and performed transformations.Developed data flow architecture & physical data model with Data Warehouse Architect. Wrote unit scripts to automate data load and performed data transformation operations. Tuned the Spark code performance- wise.

Confidential

Responsibilities:

  • Implemented a generic ETL framework with high availability for bringing related data for Oracle and Redshift from various sources using Spark.Discussed with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical, and Physical Data Models.Attained expertise in Business Intelligence and Data Visualization tools: Tableau, Confidential .Developed Spark, Python for regular expression (regex) project in the AWS environment with Linux/Windows for big data resources.Extracted, transformed, and loaded data sources to generate JSON, CSV,
  • Parquet data files with Python & SQL queries.Stored & retrieved data from data - warehouses using Amazon Redshift.Worked on Teradata Indexes, Teradata SQL queries, Utilities such as Mload, Tpump, Fast load, and Fast Export.Worked on functions in Lambda that aggregates the data from incoming events and then stored result data in Amazon Dynamo DB.Designed DAGs using Air Flow.Used Data Warehousing Concepts l ike Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snowflake Schema, Fact Table, and Dimension Table.

Confidential

BUSINESS INTELLIGENCE DEVELOPER

Environment: Informatica, Hadoop, Shell Scripting, AWS, Python, Sqoop, Hive, Oracle, Confidential, Tableau, PL/ SQL, UNIX.

Responsibilities:

  • Created roles & implemented security at the department level to allow/restrict users to view data depending on their departments.Prepared ETL packages to extract, move existing data to SQL Server from various environments for SSAS cubes. Formed dimensions in the Performance Point Server.Incorporated data from
  • OLTP databases to OLAP through SSIS as ETL.Deployed SSRS reports in Microsoft Office share point server. Rebuilt Tables & Indexes during performance tuning exercise.Monitored SQL Error Logs/Database Activities/Schedule Tasks/User Counts/Connections and eliminated deadlocks.Scheduled more than 60 jobs in Tidal for refreshing cubes & varied cyclical jobs to move critical data for business end - users.Used MDX queries for retrieving data from cubes via SSIS.Wrote & optimized SQL statements to assist business intelligence practices.

Confidential

BUSINESS INTELLIGENCE DEVELOPER

Work Environment: MS SQL Server 2016, Power BI, SQL Profiler, VS, TFS 201, MS Office 2010, SSIS/ SSRS/ SSAS.Designed, developed, and implemented fact tables, dimension tables in SSIS through the star schema model.

Responsibilities:

  • Used SQL Profiler to create traces and troubleshoot the issues involving SSIS or Stored Procedures.Worked on Azure Power BI service & On - Premise Power BI Report Server.Managed Control Flow tasks l ike For Each Loop Container, For Loop Container, Sequence Container, and Data Flow tasks l ike Merge
  • Multicast, Lookup, Derived Column, Script Component, etc.Used DML (Data Manipulation Language) to Insert/Update data by satisfying the ACID properties & referential integrity constraints.Used various heavy business logic stored procedures that helped the request from the UI to get processed a lot quicker.Developed several Ad hoc reports, Parameterized reports, and Drilldown reports. Created Entity- Relationship Diagrams and, in turn, the Database Schema.Got involved in defining elicitation and technical specification procedures.

Confidential

Data Analyst

Environment: MS SQL Server 2016, Power BI, SQL Profiler, Oracle 11 g, VS, TFS 2015Created Index, Databases, Unique/Check Constraints Views, Cluster/Non- Cluster Index, Stored Procedures, Rules, Triggers.

Responsibilities:

  • Prepared functions & developed procedures to perform application functionality on the database side during the performance improvement process.Prepared DTS and SSIS packages.Created action filters & parameters for preparing worksheets & dashboards using Tableau.
  • Developed ETL package for data conversion through various transformation tasks.Created packages & scheduled them in SQL Agent jobs for getting data from OLTP. Developed perspectives in SSAS & configured security levels for Cubes.Performed Query Optimization & Performance Tuning.

Confidential

Data Analyst

Environment: SQL Server 2008 R 2 / 12, SSDT, SSIS, SSRS, TFS, Visual Studio 2012, MS- Office, MS- Excel, SQL Scheduler

Responsibilities:

  • Achieved performance tuning by identifying the bottleneck in slow performance & resourcing heavy queries. Created SSIS packages & automated them through SQL Agent job for loading data from Flat Files, XML files, CSV files into SQL Tables.Developed Clustered/Non

    Clustered Indexes on objects as per the most used data elements by the end- users. Streamlined complex logics in SQL Queries by forming Views/Functions and used them in SSIS to read data from the OLTP system & to load into ODS.Created SQL Server Reports and generated query for forming drill down & drill through reports through SSRS. Worked as a team & helped the end- users with various ad hoc queries concerning reporting purposes

We'd love your feedback!