We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

MN

SUMMARY

  • Almost 5+ years of professional experience in IT, which includes work experience in Big Data, Hadoop ecosystem for data processing, Data Warehousing and Data Pipeline design and implementation.
  • Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
  • Expertise in using Cloud based managed services for data warehousing in Confidential Azure (Azure Data Lake Storage, Azure Data Factory).
  • Strong experience with big data processing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
  • Good understanding of cloud configuration in Amazon web services (AWS).
  • Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects
  • Worked with the Apache Airflow engine that can easily schedule and run my complex data pipelines which will make each task to get executed in a correct order
  • Excellent proficiency in Agile/Scrum and waterfall methodologies.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
  • Experience in integration of various data sources with multiple Relational Databases like SQLServer, Teradata, and Oracle.
  • Proficient in data governance, Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using Talend BigData.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch, data quality, metadata management, master data management.
  • Experience in working with creating ETL specification documents, & creating flowcharts, process work flows and data flow diagrams.
  • Experience in execution of Batch jobs through the data streams to SPARK Streaming.
  • Good knowledge in streaming applications using Apache Kafka.
  • Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
  • Extending HIVE and PIG core functionality by using custom UDF's.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
  • In - depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, MN

Responsibilities:

  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Provided application support during the build and test phases of the SDLC for their product.
  • Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the work flows.
  • Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, Data Bricks, SQL Database and SQL data warehouse environment.
  • Performed data profiling and transformation on the raw data using Pig, Python, and oracle
  • Developed predictive analytics using Apache Spark Scala APIs.
  • Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
  • Developed and implemented a data pipeline using Kafka and Strom to store data into HDFS.
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
  • Analyze, design, and build Modern data solutions using Azure PaSS service to support visualization of data. Understand current Production state of application and
  • Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity
  • Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
  • Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL
  • Developed customer cleanse functions, cleanse lists and mappings for MDM Hub.
  • Worked extensively on Oracle PL/SQL, and SQL Performance Tuning.
  • Exactly worked on Python Open stack API's.
  • Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
  • Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
  • Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Used Sqoop to efficiently transfer data between databases and HDFS.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Environment: Erwin, SQL, Oracle12c, PL/SQL, Bigdata, Hadoop, Azure Data Lake, Spark, Scala, APIs, Pig, Python, Kafka, HDFS, ETL, MDM, OLAP, OLTP, SSAS, T-SQL, Hive, SSRS, Tableau, Map Reduce, Sqoop, Scala, HBase, SSIS.

Data Engineer

Confidential, Evansville, IN

Responsibilities:

  • Developed Spark Streaming application in the Agile methodology in multiple sprints.
  • Developed Consumer application to consume the data from Kafka to Hive External tables using Scala, Spark.
  • Implemented Asynchronous commits in Consumer application to commit Kafka Offsets
  • Designed and developed a horizontally scalable APIs using Spark.
  • Performed Data validations using Scala-Spark scripts by capturing the Raw Data on HDFS and comparing with data in final tables.
  • Developed HIVE DDL Updates for the external table and carefully deployed the tables in Production in a multi- face production update for large scale Streaming ingestion application.
  • Used HBase to capture Streaming metrics such as Kafka Offsets and Streaming metrics using Kafka Listener API
  • Worked with Data ingestion, querying, processing, and analysis of big data.
  • Performed tuned and optimized various complex SQL queries.
  • Developed normalized Logical and Physical database models to design OLTP system.
  • Extensively involved in creating PL/SQL objects i.e., Procedures, Functions, and Packages.
  • Performed bug verification, release testing and provided support for Oracle based applications.
  • Used Model Mart of Erwin for effective model management of sharing, dividing, and reusing model information and design for productivity improvement
  • Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
  • Worked with Real-time Streaming using Kafka and HDFS.
  • Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Wrote, tested and implemented Teradata Fastload, Multiload, DML and DDL.
  • Used various OLAP operations like slice / dice, drill down and roll up as per business requirements.
  • Wrote SQL queries, stored procedures, views, triggers, T-SQL and DTS/SSIS.
  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Worked with Sqoop commands to import the data from different databases.
  • Gathered SSRS reports requirements and created in Tableau.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML.

Environment: Erwin, SQL, PL/SQL, Kafka1.1, AWS, API's, Agile, ETL, HDFS, OLAP, HDFS, T-SQL, SSIS, Teradata, Hive, SSRS, Sqoop, Tableau, Map Reduce, XML.

Data Engineer

Confidential

Responsibilities:

  • Spearheaded analysis, design and development for building a common architecture and data repository for Apparel and Footwear sourcing data across the Geo's for data management and reporting needs.
  • Work cross functionally with analysts, business insights, report developers, and stakeholders to ensure a tight Development/QA/UAT process and quality deployments.
  • Evaluate, cleanse, extract/transform data for analytic purpose within the context of Big data environment and structure large data sets by applying standard data modelling methods.
  • Design and develop efficient PySpark programs using cloud-based data platforms (EMR) to extract/transform/load data in between various data warehouse applications.
  • Worked on Cloud Platform like AWS and possess good knowledge on different types of instances for optimal usage of clusters based on the requirement.
  • Engineered a solution to optimize the ETL process of Alteryx to Snowflake ingestion process which takes around
  • Designed Hive tables over the parquet and csv files for loading and analyzing data.
  • Involved in developing DAGS using Airflow orchestration tool and monitored the weekly processes.
  • Identify gaps in data processes and drive improvements via continuous improvement loop by ensuring good data flow between databases and backend systems.

Environment: Python, Alteryx, AWS S3, AWS EMR, Spark, Hive, Airflow, Sqoop, Teradata SQL assistant, Snowflake, MS SQL Server 2016, Tableau 2018.3, Snowflake database, R - Programming, SharePoint, SSIS

Data Analyst/Data Modeler

Confidential

Responsibilities:

  • As a Data Analyst /Data Modeler I was responsible for all data related aspects of a project.
  • Created report on Cloud based environment using Amazon Redshift and published on Tableau
  • Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
  • Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Performed Data Analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Developed the required data warehouse model using Star schema for the generalized model
  • Implemented Visualized BI Reports with Tableau.
  • Worked on stored procedures for processing business logic in the database.
  • Extensively worked on Viewpoint for Teradata to look at performance Monitoring and performance tuning.
  • Performed Extract, Transform and Load (ETL) solutions to move legacy and ERP data into Oracle data warehouse.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Managed database design and implemented a comprehensive Snowflake-Schema with shared dimensions.
  • Worked Normalization and De-normalization concepts and design methodologies.
  • Worked on the reporting requirements for the data warehouse.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Developed complex T-Sql code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.

Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata, BI, Tableau, ETL, SSIS, SSAS, SSRS, T- SQL, Redshift.

Data Analyst | Engineer

Confidential

Responsibilities:

  • Used SQL and Spark to perform ETL processes on scalable databases.
  • Worked with clients to gather business requirements and used Tableau to translate the requirements into actionable reports, saving 20 hours of manual work each sprint.
  • Reviewed existing PL/SQL scripts to improve the performance.
  • Developed re-usable Python scripts to perform data wrangling on various data ingestion processes, thereby enabling fast debugging.
  • Extensively wrote SQL scripts to provide data to various analytics systems.
  • Developed an application to consume text files from a blob store, consume, transform, and format data to generate meaningful reports in PDF format.
  • Added exception handling logic to existing processes to enable fast and efficient debugging in case of any production issues.
  • Collaborated in the database design phase of various projects.

We'd love your feedback!