We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

Woonsocket, RI

SUMMARY

  • 8 years of experience in BIG DATA using HADOOP framework and related technologies such as HDFS, GCP, HBASE, Map Reduce, HIVE, PIG, FLUME, Mongo DB, OOZIE, SQOOP, and ZOOKEEPER, Java, J2EE, SQL, Spark and Python
  • Experience in Relational Modeling and Dimensional Data Modeling
  • Experience with maintaining data pipelines and troubleshooting data pipelines
  • Managed various data pipelines using various Devops methodologies like containerization and orchestration
  • Experience in working with Azure Databricks, Azure Datalake(ADLS), Azure Data Factory, Airflow, and Pyspark
  • Hands on experience in testing data pipelines and their speed optimization
  • Hands on experience in working with Eco systems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie, Kafka.
  • Experience in data analysis using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java
  • Experience in building automated data pipelines using Pyspark and airflow
  • Deep understanding of data import and export from relational database into Hadoop cluster and DB cluster
  • Thorough understanding of Business Intelligence and Data Warehousing Concepts with emphasis on ETL.
  • Experienced in writing SQL procedures, Triggers in Oracle, and Stored Procedures in DB2 and My SQL
  • Experience in tuning of SQL queries
  • Experience with developing large - scale distributed applications
  • Experience in developing solutions to analyze large data sets efficiently
  • Experience in Data Warehousing and ETL processes
  • Strong database modeling, SQL, ETL and data analysis skills
  • Strong communication and analytical skills with very good experience in programming & problem solving.
  • Excellent written and verbal communication skills.

TECHNICAL SKILLS

Specialization: Azure databricks, Azure Datalake, GCP, Hadoop, Hive HQL, Oracle, Teradata, SQL, Unix Shell scripting, Python, Java, Pyspark, Linux, Snowflake

Data Consumption: Kafka, Flume, Sqoop

Hadoop Ecosystem: Hive, Pig, Sqoop, Oozie, Flume, Zookeeper

Analytical and reporting tools: SAS, Tableau

Containerization and Orchestration: Puckel/docker-airflow, Airflow

Functional: Business Requirements Analysis and Process mapping

PROFESSIONAL EXPERIENCE

Confidential, Woonsocket, RI

Sr. Data engineer

Responsibilities:

  • Built data pipelines to automated Rx marketing data campaigns for Call based service
  • Performed data enhancement operations to classify the opportunities for customers based on multiple eligibility criteria
  • Utilized Azure data bricks to write Pyspark code and test it against the data in the Data lake
  • Used Linux to move files and datasets from local storage to Data lake and vice versa
  • Leveraged Databricks clusters to run and test the functions created through Azure databricks
  • Utilized Pyspark's SQL to write functions and ease up the data enhancement process
  • Employed Airflow to automatically orchestrate the data pipelines working
  • Written complex Pyspark code to make the sql code simple for joins, sub queries and correlated sub queries.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Pyspark jobs
  • Utilized but for source control and collaboration
  • Created custom Pyspark libraries to simplify the process of unionizing and breaking down the files
  • Created files in different formats like parquets, CSVs, and DATs
  • Performed validations and testing to ensure the data volume is sent and received at the predicted rate

Confidential, Pittsburgh, PA.

Sr. Data engineer

Responsibilities:

  • I was crucial in designing and building the NAVIO Datahub an internal application to help process the easy consumption and querying of billing data.
  • I played a major role in overseeing the growth of the application to more than 75+ Finance partners.
  • I established the use of core Confidential & Confidential technologies and automated data validations to ensure data quality, and smooth pipeline execution.
  • I led the Datahub enhancements Project in January of 2020 for the development of Corporate Engineering for more robust support and ongoing improvements.
  • I established an operating model to manage various stakeholders and maintain the momentum behind the project
  • Created Continuous Integration and Continuous Delivery Pipelines for the build and deployment automation in place.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data
  • Managed and reviewed Airflow and data clusters log files for bug tracking and error solving

Confidential, Fremont, CA

Big Data / ETL Engineer

Responsibilities:

  • Extracted the data from RDBMS and into HDFS and bulk loaded the data into Pig for cleansing.
  • Worked on Pig Latin Operations, transformations, and functions.
  • CreatedHive target tablesto hold the data after all the PIG ETL operations using HQL.
  • Good understanding of Partitions, Bucketing concepts in Hive.
  • Migrated the code from HQL toPYSPARK.
  • Worked on Big Data Integration &Analytics based on Hadoop, Spark, and Kafka
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked in functional, system, and regression testing activities with agile methodology.
  • Worked on Python plug-in on My SQL workbench to upload CSV files.
  • Worked in migrating Hive QL into Impala to minimize query response time.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Use Data frames for data transformation.
  • Created Hive tables, dynamic partitions using HQL.
  • Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
  • Monitored continuously and managed the Hadoop cluster using Cloudera manager.

Confidential, Scottsdale, AZ

Big Data / ETL Engineer

Responsibilities:

  • Analyzing the business requirements and doing the GAP analysis then transforming them to detailed design specifications.
  • Researched and recommend various tools and technologies on Hadoop stack considering the workloads of the organization.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive, Configure and install Hadoop and Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume).
  • Designed and implemented a distributed data storage system based on HBase and HDFS. Importing and exporting data into HDFS and Hive.
  • Setup and benchmarked Hadoop clusters for internal use.
  • Involved in managing and reviewing Hadoop log files.
  • Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala, Analyzed the SQL Scripts and designed the solution to implement using Scala.
  • Involved in Investigating any issues that would come up. Experienced with solving issues by conducting Root Cause Analysis, Incident Management & Problem Management processes.
  • Responsible for Analyzing, designing, developing, coordinating, and deploying web application.
  • Involved in application performance tuning and fixing bugs.

Confidential, Pittsburgh, PA

Data Engineer

Responsibilities:

  • Design, develop, and test processes for extracting data from legacy systems or production data bases.
  • Worked on Large sets of structured, semi-structured, and unstructured data.
  • Worked with Pyspark code for importing data from external locations to Datalake.
  • Wrote multiple Pyspark jobs for data cleaning and preprocessing
  • Developed SQL queries targeting large datasets to generate insights.
  • Developed Pyspark code to process the data and to build a data pipeline
  • Implemented hive features like Partitioning, Dynamic Partitions, Bucketing through Pyspark
  • Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.

Confidential

ETL Engineer

Responsibilities:

  • Involved in gathering requirements from the business users for reporting needs.
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Installed and configured Hadoop Map reduce, HDFS, Developed multiple MapReduce jobs in java for data
  • Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Developed connectors for elastic search and green plum for data transfer from a Kafka topic.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
  • Implemented real time analytics pipeline using Confluent Kafka, storm, elastic search, Splunk and green plum.
  • Coordinated in Testing, writing, and executing test cases, procedure, and scripts, creating test scenario.
  • Closely worked with the reporting team to ensure that correct data is presented in the reports.

Confidential

Responsibilities:

  • Designed and customized data models for Data warehouse supporting data from multiple sources on real time.
  • Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Did performance tuning to improve Data Extraction, Data process and Load time.
  • Worked with data modelers to understand financial data model and provided suggestions to the logical and physical data model.
  • Extracted tables from various databases for code review.
  • Generated document coding to create metadata names for database tables.
  • Analysed metadata and table data for comparison and confirmation.
  • Adhered to document deadlines for assigned databases.
  • Ran routine reports on a scheduled basis as well as ad-hoc based on key point indicators.
  • Designed data visualizations to analyse and communicate findings

We'd love your feedback!