Data Engineer Resume Pittsburg, PA - Hire IT People

SUMMARY

5+ years of experience in working in the data domain field which not only include data engineering but data cleaning, data wrangling, data transformation and data analysis.
Hands - on experience in configuring and using Hadoop ecosystem components like Spark, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Kafka.
Hands-on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances, Glue, Athena, Redshift.
Experience in transferring data from RDBMS to HDFS and HIVE tables using Sqoop .
Hands on experience in Exploratory Data Analysis using numerical functions and by plotting relevant visualizations which helps for feature engineering.
Hands-on experience developing Pyspark scripts to manage data transformations and data delivery for batch and streaming processes.
Designed and developed streaming pipelines using Apache Kafka and Pyspark from multiple sources like APIs, data lakes for optimizing the performance of monitoring products.
Developed Pyspark scripts in data bricks and implemented custom jobs for ETL purposes sourcing from S3 and snowflake as destination.
Experience in creating Hive tables, partitions, buckets and queries using HiveQL to optimize performance.
Responsible for developing data migration shell scripts in Linux environment which loads the data from DB2 to hive tables by performing necessary data transformations using big data technologies.
Hands on experience in various Big Data application phases like data ingestion, data analytics and data visualization.
Expertise in SQL involving Window functions, CTEs’ and manipulating date, time, conditional aggregations on data extracted.
Experience in designing time driven and data driven automated workflow using Oozie.
Created shell scripts to develop different pipeline architecture involving multiple jobs of python, sql, syncsort ETL and thought spot.
Developed python scripts to consume data from APIs and transform using packages like pandas, numpy, pyjson etc.
Hands-on experience in machine learning algorithms like Decision Trees, Support Vector Machines, K Nearest Neighbours, Linear Regression, Logistic Regression, Random Forest, Naïve Bayes Classifier, Ensemble Methods.
Experienced in identifying business problems and solving using various machine learning algorithms.

TECHNICAL SKILLS

Big Data Ecosystem: PySpark, Spark, Kafka, Hive, Sqoop, Oozie, HDFS

Programming Languages: Python, R, Scala

Data Warehouses: Snowflake, Redshift, Hive

Cloud Services: Databricks, AWS S3, EC2, Lambda

Databases: Oracle 11g/10g/9i, MySQL, PostgreSQL

Version Control Tools: Git, GitHub

BI Tools: ThoughtSpot, Tableau, Performance Analytics

Scripting Languages: Shell Scripting

Operating Systems: Windows, Linux

ML Libraries: mlr, Scikit Learn, Spark ML

Data Analysis Libraries: Tidyverse, ggplot2, dplyr, pandas, numpy

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Pittsburg, PA

Responsibilities:

Worked closely with the business analysts and System Engineers to convert the Business Requirements into Technical Requirements.
Involved in the requirements gathering, grooming, Design, Development, Unit testing and Bug fixing
Creating the tables in Hive and integrating data between Hive and Spark.
Developed Spark jobs to collect data from source systems and store it on HDFS to run Analytics.
Created Hive Partitioned and Bucketed tables to improve the performance.
Created Hive tables using user defined functions.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Loaded and transformed large data sets of structured, semi-structured and unstructured data in various formats like txt, zip, XML, JSON.
Design and implement data ingestion techniques for the data coming from various source systems.
Designed and Developed Spark code using Python, Pyspark and spark SQL for high speed data.
Involved in complete end to end code deployment process in Production.
Maintained fully automated CI/CD pipelines for code deployment (Gitlab/Jenkins).

Environment: Hortonworks 2.5, HDFS 2.7.3, SPARK 2.0.0, Hive 2.0.0, YARN

Data Engineer

Confidential

Responsibilities:

Developed and implemented data pipelines to process 100 million records using Pyspark, AWS Lambda, APIs and stored processed data in Hive.
Developed Pyspark scripts in databricks to collect, clean and transform data from S3 to Hive.
Developed Lambda scripts to trigger jobs which processes and stores the data in multiple s3 buckets.
Used Hive to analyze the partitioned and bucketed data and compute various metrics reporting.
Developed Sqoop scripts to import, export and update the data between HDFS and PostgreSQL.
Automated ETL jobs using Oozie to coordinate Python, Hive, and Pyspark in AWS EC2.
Developed Hive scripts to parse the raw data, populate staging tables and to store the redefined data in partitioned tables in the AWS S3.
Developed python scripts to perform sensitive data masking for service now extracted data with different conditions on different views.
Analyzed and Transformed SQL scripts into Pyspark SQL scripts for optimized and faster performance.
Used Kafka consumer APIs from a topic to consume data every 15 minutes and land the data in S3.
Created databases, tables and views in Hive with different access conditions to make data available for different users.
Performed absolute quality checks to validate the extracted data is in sync with the respective destination view schema.
Developed Tableau dashboards for end customers to understand the incidents and change order tickets being raised for a particular tech organization.
Developed individual tableau reports to track the changes of incidents daily and resolved rate by taking snapshot data from snowflake.

Environment: Pyspark, Hive, SQL, Kafka, PysparkSQL, Sqoop, Oozie, shell scripting, Linux, Python, Tableau, AWS EC2, S3, Lambda, Servicenow.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Pittsburg, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship