We provide IT Staff Augmentation Services!

Data Engineer Resume

Eden Prairie, MN

SUMMARY:

  • Above 7 years of professional experience in IT, which includes work experience in Big Data, Hadoop ecosystem for data processing, Data Warehousing and Data Pipeline design and implementation.
  • Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
  • Expertise in using Cloud based managed services for data warehousing in Confidential Azure (Azure Data Lake Storage, Azure Data Factory).
  • Strong experience with big data processing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
  • Good understanding of cloud configuration in Amazon web services (AWS).
  • Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects
  • Excellent proficiency in Agile/Scrum and waterfall methodologies.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
  • Experience in integration of various data sources with multiple Relational Databases like SQL Server, Teradata, and Oracle.
  • Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using Talend BigData.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Proficient in data governance, data quality, metadata management, master data management.
  • Experience in working with creating ETL specification documents, & creating flowcharts, process work flows and data flow diagrams.
  • Experience in execution of Batch jobs through the data streams to SPARK Streaming.
  • Good knowledge in streaming applications using Apache Kafka.
  • Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
  • Extending HIVE and PIG core functionality by using custom UDF’s.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
  • In - depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

TECHNICAL SKILLS:

Big Data Tools: HBase 1.2, Hive 2.3, Pig 0.17, HDFS, Sqoop 1.4, Kafka 1.0.1, Scala, Oozie 4.3, Hadoop3.0, MapReduce, Spark

BI Tools: Tableau 10, Tableau server 10, SAP Business Objects, MS BI

Methodologies: JAD, System Development Life Cycle (SDLC), Agile, Waterfall Model.

ETL Tools: Informatica 9.6/9.1 and Tableau, Pentaho

Data Modeling Tools: Erwin Data Modeler 9.8, ER Studio v17, and Power Designer 16.6.

Databases: Oracle 12c, Teradata R15, MS SQL Server 2016, DB2, Snowflake SaaS

Cloud Architecture: Amazon AWS, EC2, Basic MS Azure, Google Cloud (GCP)

Programming Languages: SQL, PL/SQL, Python, UNIX shell Scripting

Operating System: Windows, Unix

WORK EXPERIENCE:

Confidential, Eden Prairie, MN

Data Engineer

Responsibilities:

  • Developed complete end to end Big-data processing in hadoop eco system.
  • Provided application support during the build and test phases of the SDLC for their product.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, Data Bricks, SQL Database and SQL data warehouse environment
  • Performed data profiling and transformation on the raw data using Pig, Python, and oracle
  • Developed predictive analytic using Apache Spark Scala APIs.
  • Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
  • Developed and implemented a data pipeline using Kafka and Strom to store data into HDFS.
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
  • Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
  • Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL
  • Developed customer cleanse functions, cleanse lists and mappings for MDM Hub
  • Worked extensively on Oracle PL/SQL, and SQL Performance Tuning.
  • Exactly worked on Python Open stack API's.
  • Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
  • Created shared dimension tables, measures, hierarchies, levels, cubes and aggregations on MS OLAP/ OLTP/Analysis Server (SSAS).
  • Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
  • Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Used Sqoop to efficiently transfer data between databases and HDFS.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Worked on the creating Adhoc reports, Database Imports and Exports using SSIS

Environment: Erwin9.8, SQL, Oracle12c, PL/SQL, Bigdata3.0, Hadoop3.0, Azure Data Lake, Spark, Scala, APIs, Pig0.17, Python, Kafka1.1, HDFS, ETL, MDM, OLAP, OLTP, SSAS, T-SQL, Hive2.3, SSRS, Tableau, MapReduce, Sqoop1.4, Scala, HBase1.2, SSIS.

Confidential, Ashburn, VA

Data Analyst/Data Engineer

Responsibilities:

  • Followed Test driven development of Agile Methodology to produce high quality software.
  • Designed and developed a horizontally scalable APIs using Python Flask.
  • Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
  • Worked with cloud providers and API's for Amazon (AWS) EC2, S3, VPC with GFS storage.
  • Worked with Data ingestion, querying, processing and analysis of big data.
  • Performed tuned and optimized various complex SQL queries.
  • Developed normalized Logical and Physical database models to design OLTP system.
  • Extensively involved in creating PL/SQL objects i.e. Procedures, Functions, and Packages.
  • Performed bug verification, release testing and provided support for Oracle based applications.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement
  • Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
  • Worked with Real-time Streaming using Kafka and HDFS.
  • Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Wrote, tested and implemented Teradata Fastload, Multiload, DML and DDL.
  • Used various OLAP operations like slice / dice, drill down and roll up as per business requirements.
  • Wrote SQL queries, stored procedures, views, triggers, T-SQL and DTS/SSIS.
  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Worked with Sqoop commands to import the data from different databases.
  • Gathered SSRS reports requirements and created in Tableau.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML.

Environment: Erwin9.8, SQL, PL/SQL, Kafka1.1, AWS, API's, Agile, ETL, HDFS, OLAP, HDFS, T-SQL, SSIS, Teradata15, Hive2.3, SSRS, Sqoop1.4, Tableau, Map Reduce, XML.

Confidential, Round Rock, TX

Data Analyst/Data Modeler

Responsibilities:

  • As a Data Analyst /Data Modeler I was responsible for all data related aspects of a project.
  • Created report on Cloud based environment using Amazon Redshift and published on Tableau
  • Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
  • Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Performed Data Analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Developed the required data warehouse model using Star schema for the generalized model
  • Implemented Visualized BI Reports with Tableau.
  • Worked on stored procedures for processing business logic in the database.
  • Extensively worked on Viewpoint for Teradata to look at performance Monitoring and performance tuning.
  • Performed Extract, Transform and Load (ETL) solutions to move legacy and ERP data into Oracle data warehouse.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Managed database design and implemented a comprehensive Snow flake-Schema with shared dimensions.
  • Worked Normalization and De-normalization concepts and design methodologies.
  • Worked on the reporting requirements for the data warehouse.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Developed complex T-Sql code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.

Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata, BI, Tableau, ETL, SSIS, SSAS, SSRS, T-SQL, Redshift.

Confidential

Data Analyst

Responsibilities:

  • Worked with Data Analyst for requirements gathering, business analysis and project coordination.
  • Wrote Python normalizations scripts to find duplicate data in different environments.
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Performed Data Analysis and Data validation by writing SQL queries using SQL assistant.
  • Worked with Informatica cloud for creating source and target objects, developed source to target mappings.
  • Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
  • Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Created new tables, written stored procedures, triggers, views, functions.
  • Compared data with original source documents and validate Data accuracy.
  • Developed the financing reporting requirements by analyzing the existing business objects reports
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports

Environment: SQL, OLAP, Python, OLTP, Informatica, SSIS, SSRS, T-SQL, Tableau, Excel.

Hire Now