Data Engineer Resume
MN
SUMMARY
- Almost 5+ years of professional experience in IT, which includes work experience in Big Data, Hadoop ecosystem for data processing, Data Warehousing and Data Pipeline design and implementation.
- Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
- Expertise in using Cloud based managed services for data warehousing in Confidential Azure (Azure Data Lake Storage, Azure Data Factory).
- Strong experience with big data processing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
- Good understanding of cloud configuration in Amazon web services (AWS).
- Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects
- Worked with the Apache Airflow engine that can easily schedule and run my complex data pipelines which will make each task to get executed in a correct order
- Excellent proficiency in Agile/Scrum and waterfall methodologies.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
- Experience in integration of various data sources with multiple Relational Databases like SQLServer, Teradata, and Oracle.
- Proficient in data governance, Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using Talend BigData.
- Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch, data quality, metadata management, master data management.
- Experience in working with creating ETL specification documents, & creating flowcharts, process work flows and data flow diagrams.
- Experience in execution of Batch jobs through the data streams to SPARK Streaming.
- Good knowledge in streaming applications using Apache Kafka.
- Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
- Extending HIVE and PIG core functionality by using custom UDF's.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
- In - depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
- Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
- Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential, MN
Responsibilities:
- Developed complete end to end Big-data processing in Hadoop eco system.
- Provided application support during the build and test phases of the SDLC for their product.
- Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the work flows.
- Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, Data Bricks, SQL Database and SQL data warehouse environment.
- Performed data profiling and transformation on the raw data using Pig, Python, and oracle
- Developed predictive analytics using Apache Spark Scala APIs.
- Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
- Developed and implemented a data pipeline using Kafka and Strom to store data into HDFS.
- Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
- Analyze, design, and build Modern data solutions using Azure PaSS service to support visualization of data. Understand current Production state of application and
- Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
- Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity
- Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
- Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL
- Developed customer cleanse functions, cleanse lists and mappings for MDM Hub.
- Worked extensively on Oracle PL/SQL, and SQL Performance Tuning.
- Exactly worked on Python Open stack API's.
- Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
- Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
- Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
- Used Sqoop to efficiently transfer data between databases and HDFS.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Environment: Erwin, SQL, Oracle12c, PL/SQL, Bigdata, Hadoop, Azure Data Lake, Spark, Scala, APIs, Pig, Python, Kafka, HDFS, ETL, MDM, OLAP, OLTP, SSAS, T-SQL, Hive, SSRS, Tableau, Map Reduce, Sqoop, Scala, HBase, SSIS.
Data Engineer
Confidential, Evansville, IN
Responsibilities:
- Developed Spark Streaming application in the Agile methodology in multiple sprints.
- Developed Consumer application to consume the data from Kafka to Hive External tables using Scala, Spark.
- Implemented Asynchronous commits in Consumer application to commit Kafka Offsets
- Designed and developed a horizontally scalable APIs using Spark.
- Performed Data validations using Scala-Spark scripts by capturing the Raw Data on HDFS and comparing with data in final tables.
- Developed HIVE DDL Updates for the external table and carefully deployed the tables in Production in a multi- face production update for large scale Streaming ingestion application.
- Used HBase to capture Streaming metrics such as Kafka Offsets and Streaming metrics using Kafka Listener API
- Worked with Data ingestion, querying, processing, and analysis of big data.
- Performed tuned and optimized various complex SQL queries.
- Developed normalized Logical and Physical database models to design OLTP system.
- Extensively involved in creating PL/SQL objects i.e., Procedures, Functions, and Packages.
- Performed bug verification, release testing and provided support for Oracle based applications.
- Used Model Mart of Erwin for effective model management of sharing, dividing, and reusing model information and design for productivity improvement
- Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
- Worked with Real-time Streaming using Kafka and HDFS.
- Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
- Designed the data marts in dimensional data modeling using star and snowflake schemas.
- Wrote, tested and implemented Teradata Fastload, Multiload, DML and DDL.
- Used various OLAP operations like slice / dice, drill down and roll up as per business requirements.
- Wrote SQL queries, stored procedures, views, triggers, T-SQL and DTS/SSIS.
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
- Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
- Worked with Sqoop commands to import the data from different databases.
- Gathered SSRS reports requirements and created in Tableau.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML.
Environment: Erwin, SQL, PL/SQL, Kafka1.1, AWS, API's, Agile, ETL, HDFS, OLAP, HDFS, T-SQL, SSIS, Teradata, Hive, SSRS, Sqoop, Tableau, Map Reduce, XML.
Data Engineer
Confidential
Responsibilities:
- Spearheaded analysis, design and development for building a common architecture and data repository for Apparel and Footwear sourcing data across the Geo's for data management and reporting needs.
- Work cross functionally with analysts, business insights, report developers, and stakeholders to ensure a tight Development/QA/UAT process and quality deployments.
- Evaluate, cleanse, extract/transform data for analytic purpose within the context of Big data environment and structure large data sets by applying standard data modelling methods.
- Design and develop efficient PySpark programs using cloud-based data platforms (EMR) to extract/transform/load data in between various data warehouse applications.
- Worked on Cloud Platform like AWS and possess good knowledge on different types of instances for optimal usage of clusters based on the requirement.
- Engineered a solution to optimize the ETL process of Alteryx to Snowflake ingestion process which takes around
- Designed Hive tables over the parquet and csv files for loading and analyzing data.
- Involved in developing DAGS using Airflow orchestration tool and monitored the weekly processes.
- Identify gaps in data processes and drive improvements via continuous improvement loop by ensuring good data flow between databases and backend systems.
Environment: Python, Alteryx, AWS S3, AWS EMR, Spark, Hive, Airflow, Sqoop, Teradata SQL assistant, Snowflake, MS SQL Server 2016, Tableau 2018.3, Snowflake database, R - Programming, SharePoint, SSIS
Data Analyst/Data Modeler
Confidential
Responsibilities:
- As a Data Analyst /Data Modeler I was responsible for all data related aspects of a project.
- Created report on Cloud based environment using Amazon Redshift and published on Tableau
- Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
- Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Performed Data Analysis and data profiling using complex SQL on various sources systems including Oracle.
- Developed the required data warehouse model using Star schema for the generalized model
- Implemented Visualized BI Reports with Tableau.
- Worked on stored procedures for processing business logic in the database.
- Extensively worked on Viewpoint for Teradata to look at performance Monitoring and performance tuning.
- Performed Extract, Transform and Load (ETL) solutions to move legacy and ERP data into Oracle data warehouse.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Managed database design and implemented a comprehensive Snowflake-Schema with shared dimensions.
- Worked Normalization and De-normalization concepts and design methodologies.
- Worked on the reporting requirements for the data warehouse.
- Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
- Developed complex T-Sql code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata, BI, Tableau, ETL, SSIS, SSAS, SSRS, T- SQL, Redshift.
Data Analyst | Engineer
Confidential
Responsibilities:
- Used SQL and Spark to perform ETL processes on scalable databases.
- Worked with clients to gather business requirements and used Tableau to translate the requirements into actionable reports, saving 20 hours of manual work each sprint.
- Reviewed existing PL/SQL scripts to improve the performance.
- Developed re-usable Python scripts to perform data wrangling on various data ingestion processes, thereby enabling fast debugging.
- Extensively wrote SQL scripts to provide data to various analytics systems.
- Developed an application to consume text files from a blob store, consume, transform, and format data to generate meaningful reports in PDF format.
- Added exception handling logic to existing processes to enable fast and efficient debugging in case of any production issues.
- Collaborated in the database design phase of various projects.