Data Engineer Resume MN - Hire IT People

SUMMARY

Almost 5+ years of professional experience in IT, which includes work experience in Big Data, Hadoop ecosystem for data processing, Data Warehousing and Data Pipeline design and implementation.
Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
Expertise in using Cloud based managed services for data warehousing in Confidential Azure (Azure Data Lake Storage, Azure Data Factory).
Strong experience with big data processing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
Good understanding of cloud configuration in Amazon web services (AWS).
Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects
Worked with the Apache Airflow engine that can easily schedule and run my complex data pipelines which will make each task to get executed in a correct order
Excellent proficiency in Agile/Scrum and waterfall methodologies.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
Experience in integration of various data sources with multiple Relational Databases like SQLServer, Teradata, and Oracle.
Proficient in data governance, Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using Talend BigData.
Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch, data quality, metadata management, master data management.
Experience in working with creating ETL specification documents, & creating flowcharts, process work flows and data flow diagrams.
Experience in execution of Batch jobs through the data streams to SPARK Streaming.
Good knowledge in streaming applications using Apache Kafka.
Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
Extending HIVE and PIG core functionality by using custom UDF's.
Experience in designing both time driven and data driven automated workflows using Oozie.
Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
In - depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, MN

Responsibilities:

Developed complete end to end Big-data processing in Hadoop eco system.
Provided application support during the build and test phases of the SDLC for their product.
Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the work flows.
Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, Data Bricks, SQL Database and SQL data warehouse environment.
Performed data profiling and transformation on the raw data using Pig, Python, and oracle
Developed predictive analytics using Apache Spark Scala APIs.
Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
Developed and implemented a data pipeline using Kafka and Strom to store data into HDFS.
Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
Analyze, design, and build Modern data solutions using Azure PaSS service to support visualization of data. Understand current Production state of application and
Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity
Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL
Developed customer cleanse functions, cleanse lists and mappings for MDM Hub.
Worked extensively on Oracle PL/SQL, and SQL Performance Tuning.
Exactly worked on Python Open stack API's.
Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
Created Hive External tables and loaded the data into tables and query data using HQL.
Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services
Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
Used Sqoop to efficiently transfer data between databases and HDFS.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Environment: Erwin, SQL, Oracle12c, PL/SQL, Bigdata, Hadoop, Azure Data Lake, Spark, Scala, APIs, Pig, Python, Kafka, HDFS, ETL, MDM, OLAP, OLTP, SSAS, T-SQL, Hive, SSRS, Tableau, Map Reduce, Sqoop, Scala, HBase, SSIS.

Data Engineer

Confidential, Evansville, IN

Responsibilities:

Developed Spark Streaming application in the Agile methodology in multiple sprints.
Developed Consumer application to consume the data from Kafka to Hive External tables using Scala, Spark.
Implemented Asynchronous commits in Consumer application to commit Kafka Offsets
Designed and developed a horizontally scalable APIs using Spark.
Performed Data validations using Scala-Spark scripts by capturing the Raw Data on HDFS and comparing with data in final tables.
Developed HIVE DDL Updates for the external table and carefully deployed the tables in Production in a multi- face production update for large scale Streaming ingestion application.
Used HBase to capture Streaming metrics such as Kafka Offsets and Streaming metrics using Kafka Listener API
Worked with Data ingestion, querying, processing, and analysis of big data.
Performed tuned and optimized various complex SQL queries.
Developed normalized Logical and Physical database models to design OLTP system.
Extensively involved in creating PL/SQL objects i.e., Procedures, Functions, and Packages.
Performed bug verification, release testing and provided support for Oracle based applications.
Used Model Mart of Erwin for effective model management of sharing, dividing, and reusing model information and design for productivity improvement
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
Worked with Real-time Streaming using Kafka and HDFS.
Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
Designed the data marts in dimensional data modeling using star and snowflake schemas.
Wrote, tested and implemented Teradata Fastload, Multiload, DML and DDL.
Used various OLAP operations like slice / dice, drill down and roll up as per business requirements.
Wrote SQL queries, stored procedures, views, triggers, T-SQL and DTS/SSIS.
Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
Worked with Sqoop commands to import the data from different databases.
Gathered SSRS reports requirements and created in Tableau.
Designed and developed Map Reduce jobs to process data coming in different file formats like XML.

Environment: Erwin, SQL, PL/SQL, Kafka1.1, AWS, API's, Agile, ETL, HDFS, OLAP, HDFS, T-SQL, SSIS, Teradata, Hive, SSRS, Sqoop, Tableau, Map Reduce, XML.

Data Engineer

Confidential

Responsibilities:

Spearheaded analysis, design and development for building a common architecture and data repository for Apparel and Footwear sourcing data across the Geo's for data management and reporting needs.
Work cross functionally with analysts, business insights, report developers, and stakeholders to ensure a tight Development/QA/UAT process and quality deployments.
Evaluate, cleanse, extract/transform data for analytic purpose within the context of Big data environment and structure large data sets by applying standard data modelling methods.
Design and develop efficient PySpark programs using cloud-based data platforms (EMR) to extract/transform/load data in between various data warehouse applications.
Worked on Cloud Platform like AWS and possess good knowledge on different types of instances for optimal usage of clusters based on the requirement.
Engineered a solution to optimize the ETL process of Alteryx to Snowflake ingestion process which takes around
Designed Hive tables over the parquet and csv files for loading and analyzing data.
Involved in developing DAGS using Airflow orchestration tool and monitored the weekly processes.
Identify gaps in data processes and drive improvements via continuous improvement loop by ensuring good data flow between databases and backend systems.

Environment: Python, Alteryx, AWS S3, AWS EMR, Spark, Hive, Airflow, Sqoop, Teradata SQL assistant, Snowflake, MS SQL Server 2016, Tableau 2018.3, Snowflake database, R - Programming, SharePoint, SSIS

Data Analyst/Data Modeler

Confidential

Responsibilities:

As a Data Analyst /Data Modeler I was responsible for all data related aspects of a project.
Created report on Cloud based environment using Amazon Redshift and published on Tableau
Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
Performed Data Analysis and data profiling using complex SQL on various sources systems including Oracle.
Developed the required data warehouse model using Star schema for the generalized model
Implemented Visualized BI Reports with Tableau.
Worked on stored procedures for processing business logic in the database.
Extensively worked on Viewpoint for Teradata to look at performance Monitoring and performance tuning.
Performed Extract, Transform and Load (ETL) solutions to move legacy and ERP data into Oracle data warehouse.
Developed and maintained data dictionary to create metadata reports for technical and business purpose.
Managed database design and implemented a comprehensive Snowflake-Schema with shared dimensions.
Worked Normalization and De-normalization concepts and design methodologies.
Worked on the reporting requirements for the data warehouse.
Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
Developed complex T-Sql code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.

Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata, BI, Tableau, ETL, SSIS, SSAS, SSRS, T- SQL, Redshift.

Data Analyst | Engineer

Confidential

Responsibilities:

Used SQL and Spark to perform ETL processes on scalable databases.
Worked with clients to gather business requirements and used Tableau to translate the requirements into actionable reports, saving 20 hours of manual work each sprint.
Reviewed existing PL/SQL scripts to improve the performance.
Developed re-usable Python scripts to perform data wrangling on various data ingestion processes, thereby enabling fast debugging.
Extensively wrote SQL scripts to provide data to various analytics systems.
Developed an application to consume text files from a blob store, consume, transform, and format data to generate meaningful reports in PDF format.
Added exception handling logic to existing processes to enable fast and efficient debugging in case of any production issues.
Collaborated in the database design phase of various projects.

We provide IT Staff Augmentation Services!

Data Engineer Resume

MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship