Data Engineer Resume Detroit, MI - Hire IT People

SUMMARY

8 years of IT experience in the Analysis, Design, Development of ETL and Data pipelines using Hadoop eco system tools and IBM Infosphere Data Stage.
Proficient in Analyzing Business processes requirements and translating them into technical requirements. Creating design overviews and technical design documents.
Good understanding of distributed computing, cloud computing and parallel processing frameworks.
Hands on experience in developing efficient solutions using Hadoop eco system tools such as Hive, Pig, Oozie, Flume, PySpark, HDFS, Storm
Developed large scale data ingestion framework using Sqoop to ingest data from Oracle, SQL Server and Teradata RDBMS systems.
Experience writing User defined functions for custom functionality in Hive and Spark.
Worked with different data sources such as flat files, XMLs, JSONs and RDBMS and stored them in HDFS as Parquet, ORC, Avro.
Extensive experience in transforming the data using Spark Dataframe API functions and RDD functions such as filter, join, map, flat map etc.
Experience working with structured streaming using spark streaming modules.
Knowledge in setting up Kafka topics and utilizing kafka producers, consumers and to store the data HDFS from Kafka streams.
Implemented Slowly Changing Dimensions, Star schemas and 3NF data models using IBM Datastage tool.
Experience working with Amazon Web Services such as Elastic Map Reduce, EC2, S3 and Athena.
Proficient in writing Error handling, reconciliation and logging frameworks.
Experience doing data development in Waterfall and Agile methodologies. Worked on both Scrum and Kanban Agile practice.
Proficient in working with structured streaming applications using Spark streaming and Flume.
Proficient in working with Git and Subversion version control tools such as Bit Bucket, GitHub and SVN.
Worked with different IDEs like PyCharm, VS Code, Jupyter and Cloudera Data Science workbench.
Experience utilizing CI/CD pipelines to simplify code migration process.
Experience writing shell scripts for file watchers, spark submit scripts.
Hands on experience in scheduling jobs in Control - m, oozie andAutosys
Co-ordinate with vendor and BAs to support and maintenance of various applications.
Excellent communication and interpersonal skills, ability to learn quickly, good analytical reasoning and high adaptability to new technologies and tools.
Strong Team working spirit and relationship management skills.

TECHNICAL SKILLS

Hadoop Distributions: Cloudera, Hortonworks and AWS EMR

Big Data Eco System Tools: Hive, Spark, Oozie, Impala, SparkSQL, PySpark, Pig, Spark Streaming, Kafka

Languages: Python, Shell Scripting, SQL

ETL: IBM DataStage 8.x

AWS Services: S3, EC2, EMR, Athena

Misc Tools: PyCharm, Jupyter, Bitbucket, Jira, Putty, Control-m, Quality Center, Cloudera Data Science Workbench

PROFESSIONAL EXPERIENCE

Confidential, Detroit, MI

Data Engineer

Responsibilities:

Hands-on major components in Hadoop Echo Systems like Spark, HDFS, HIVE, HBase, Zookeeper, Sqoop, Oozie.
Developing Sqoop jobs to ingest data from various system of records into Enterprise data lake.
Development of Spark jobs in PySpark and SparkSQL to run on top of hive tables and create transformed data sets for downstream consumption.
Working with business analysts to convert functional requirements into technical requirements and build appropriate data pipelines.
Ingesting data from various source systems like Oracle, SQL Server, Flat files, JSONs.
Conducting Exploratory data analysis in Jupyter notebooks using Python libraries and sharing the data analysis
Performance tuning spark and hive jobs by reading execution plans, DAGs and Yarn logs.
Creating generic shell scripts to submit Hadoop and spark jobs on EMR and on-prem edge node.
Worked on migrating on-prem Hadoop cluster data and data pipelines to AWS cloud.
Writing Complex SparkSQL code to clean, join, transform and aggregate the datasets and publish them for Power BI team to produce operational scorecards.
Writing custom python modules for reusable python code.
Solutioning appropriate partition, bucketing schemes and making sure correct load policies are employed so data can be stored as per requirements.
Creating Oozie workflows, Coordinators and scheduling handshake jobs in control-m.
Working with production support teams and administration teams to ensure correct access controls are setup on each hive database.
Working with governance teams to ensure metadata management, data lineage and technical metadata are correctly updated for each data asset.
Working with master-feature branch model and commit the code with appropriate comments.
Attending sprint planning, agile ceremonies and demo the work products on bi-weekly basis.
Documenting data flow diagrams and technical logic in confluence.

Environment: Cloudera Hadoop distribution, AWS EMR, S3, Athena, Hive, Impala,PySpark, SparkSQL, Oralce 11g/12c, Jira, Bit Bucket, Power BI, Control-m

Confidential, Detroit, MI

Hadoop Developer

Responsibilities:

Developed Spark scripts by using Scala as per the requirement.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Performed different types of transformations and actions on the RDD to meet the business requirements.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
Also worked on analyzing Hadoop cluster and different BigData analytic tools including HBase and Sqoop.
Involved in loading data from UNIX file system to HDFS.
Responsible to manage data coming from various sources.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Involved in managing and reviewing Hadoop log files.
Imported data using Sqoop to load data from SQL Server to HDFS on regular basis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Responsible for writing Hive queries for data analysis to meet the business requirements.
Responsible for creating Hive tables and working on them using HiveQL.
Responsible for importing and exporting data into HDFS and Hive using Sqoop.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Extended HIVE core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive
Used Spark framework on both batch and real-time data processing.
Hands-on processing of data using Spark Streaming API.

Environment: Hortonworks Ambari, Tez, Hive, Version one, GitHub, Control-m, Shell scripting, Spark Streaming, Service Now, SQL Server.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Involved in developing roadmap for migration of enterprise data from multiple data sources like SQL Server, provider databases into HDFS which serves as a centralized datahub across the organization.
Loaded and transformed large sets of structured and semi structured data from various downstream systems.
Developed ETL pipelines using Spark and Hive for performing various business specific transformations.
Building Applications and automating the pipelines in Spark for bulk loads as well as Incremental Loads of various Datasets.
Worked closely with our team’s data analysts and consumers to shape the datasets as per the requirements.
Automated the data pipeline to ETL all the Datasets along with full loads and incremental loads of data.
Worked on building input adapters for data dumps from FTP Servers using Apache spark.
Wrote spark applications to perform operations like data inspection, cleaning, load and transforms the large sets of structured and semi-structured data.
Developed Spark with Python and Spark-SQL for testing and processing of data.
Reporting the spark job stats, Monitoring and Running Data Quality Checks are made available for each Datasets.
Used SQL Programming Skills to work around the Relational SQL Databases

Environment: Cloudera Hadoop distribution, Hive, Impala, Cognos, IBM Datastage, Shell scripting, SQL, PL/SQL and Autosys.

Confidential

Datastage Developer

Responsibilities:

Involved in Requirement Gathering with the business team and in creating ETL design document and technical specifications document for the project.
Optimized the performance of the Informatica mappings by analyzing Job logs and understanding various bottlenecks (source/target/stages)
Created UNIX shell scripts to invoke the Informatica workflows & Oracle stored procedures
Implemented Slowly Changing Dimension Type 1 and Type 2 for inserting and updating Target tables for maintaining the history.
Prompt in responding to business user queries and changes. Designed and Developed jobs in datastageto load the data from Flat Files, Oracle and MS SQL Server sources.
Developed custom ETL objects to load the data in generic fashion.
Responsible for developing and re-defining several complex jobs and job sequencers to process various feeds using different datastage stages, properties.
Troubleshoot and created automatic script/SQL generators.
Designed Unit test document after the datastagedevelopment and verified results before moving it to QA.
Supported the test environments.
Production transition documentation and warranty support for production support teams.

Environment: Datastage 8.1, Oracle 10g, SQL Server 2005, TOAD, SQL,Unix.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Detroit, MI

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship