We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

3.00/5 (Submit Your Rating)

Denver, CO

SUMMARY

  • Around 7 years of expertise as a software engineer & data engineer in the design, development, maintenance, integration, and analysis of Big Data systems. Ability to deliver intelligent and automated data monitoring, ETLs, and management solutions.
  • Expertise in gathering requirements, prioritizing activities, and optimizing procedures such that manual involvement is minimized.
  • Extensive expertise in data extraction, processing, storage, and analysis using Cloud Services and Big Data technologies such as MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, and Kafka. Experience working with Cloud technologies like as Azure, Google Cloud and Amazon Web Services.
  • Worked on a variety of projects including data lakes, migration, support, and application development.
  • Experience in working in different bigdata distributions like Hortonworks, Cloudera and MapR.
  • Worked on optimized data formats like Parquet, ORC and Avro for data storage and extraction.
  • Experience in building ingestion batch pipelines using Sqoop, Fume, Spark, Talend and NiFi.
  • Experience in design, develop and maintain Datalake projects using different bigdata tool stack.
  • Migrated data from different data sources like Oracle, MySQL, Teradata to Hive, HBase and HDFS.
  • Built optimized Hive ETL and Spark data pipelines and datastores in Hive and HBase.
  • Designed and developed optimized data pipelines using Spark SQL, Spark with Scala and Python.
  • Implemented various optimization techniques in Hive and Spark ETL process.
  • Experienced in building Jupyter notebooks using PySpark for extensive data analysis.
  • Working knowledge in different NoSQL databases like MongoDB, HBase, Cassandra.
  • Experience in ingesting and exporting data from Apache Kafka using Apache Spark Streaming.
  • Experience in building end to end automation for different actions using Apache Oozie.
  • Implemented scalable microservices applications using Spring boot for building Rest end points.
  • Built Talend and NiFi integrations for ingestion bi - directional data into different sources.
  • Good exposure to different cloud platforms like Amazon webservices, Azure and Google Cloud.
  • Expert with AWS Databases such as RDS, Redshift, DynamoDB and Elastic Cache.
  • Implemented application development using various tool stack in Azure like Cosmos DB, AZURE Databricks, Event Hub and Azure HD Insights.
  • Built historical and incremental reporting dashboards in Power Bi and Tableau.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.
  • Built application metrics and alerting dashboards using Kibana.
  • Built Kibana dashboard for real time logs aggregations and analysis.
  • Experience in CI and CD using Jenkins and Docker.
  • Worked on scaling microservices applications using Kubernetes and docker.
  • Experience in GCP tools like Big Query, Pub/Sub, Dataproc and Data lab.
  • Experience in working Agile environments and waterfall models.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Sqoop, Flume, Map Reduce, Pig, Hive, Oozie, Spark, YARN, Kafka

Scripting Languages: Python, Shell

Languages: Java, SQL, Unix Shell Scripting, Scala, Python, R.

Databases - RDBMS: Oracle 10g, MySQL, Teradata.

IDE: Eclipse, Jupyter

ETL Tools: Informatica, Abnitio, Talend v5.6, v6.2, v6.3, v7.2, v7.3, Talend DI, BDE, Data Fabric

Reporting Tools: SAP Business Objects, R Studio

Cloud Technologies: AWS, Google Cloud Platform

PROFESSIONAL EXPERIENCE

Confidential, Denver, CO

Sr Data Engineer

Responsibilities:

  • Implemented ETL data pipelines with various transformation in PySpark and Spark with Scala.
  • Developed google cloud functions for event driven mechanisms for daily data loads.
  • Implemented various scripts to load data into Big Query for further data manipulations.
  • Developed various modules in spring boot for Kafka integration for streaming applications.
  • Implemented airflow DAGs for various daily job automations in python.
  • Utilized looker dashboard to project data from big query for business analysis purpose.
  • Developed various spark applications which will run on top data proc on demand basis.
  • Implemented CI CD process using Jenkins for which automated testing.
  • Developed migration process to ingest data into snowflake application for customer data platform.
  • Ingested various types of daily and history files into Google Cloud Storage.
  • Developed python scripts to manipulate and data cleansing purposes.
  • Implemented libraries to ingest daily applications metrics in Grafana for daily monitoring.

Environment: Python, Unix, GCP services Dataflow, Data proc, pub-sub, Cloud storage, FHIR

Confidential, Kansas City

Sr Data Engineer

Responsibilities:

  • Developed a framework for ingestion and manipulation of data from multiple batches and streaming sources using Spark. Enhanced the existing spark streaming applications to support structured streaming and new sources like Kafka, JMS topics, and queues to improve the message delivery semantics.
  • Experience with different components of Spark and Hive along with extensive experience working on performance optimization.
  • Used PySpark data frame to read text data, CSV data, image data from HDFS and Amazon S3.
  • Development of ETL pipelines for transferring data from diverse sources to many layers in S3 using AWS Glue and Databricks.
  • Implemented partitioning strategies to improve the performance of spark tasks that ingest tables from multiple JDBC sources like Oracle.
  • Developed Lambda scripts in Python to launch the EMR clusters on-demand and automate.
  • Enforced the deletion of records containing Personally Identifiable Information from multiple layers of data based on the user requests to be compliant to CCPA.
  • Developed Spark scripts by using PySpark shell commands as per the requirement.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly which gets the data from Kafka in near real-time and persists into Cassandra.
  • Implementation of Change Data Capture with Slowly Changing Dimensions Type 2 in Databricks Delta Lake and Snowflake for generating reports in near real-time.
  • Implemented data ingestion jobs in Spark using AWS Glue to store data in Redshift.
  • Monitored the AWS cluster to identify the job progress, idle instances using EMR metrics and CloudWatch metrics.
  • Responsible for developing a data pipeline with Amazon AWS to extract the data from weblogs and store it in S3.

Environment: Spark core, SparkSQL, Spark streaming, AWS Data Pipelines, Glue, Redshift, Kafka, Hive, GitHub, Web flow, Amazon s3, Amazon EMR, PySpark, S3, EC2, EMR, Lambda.

Confidential, Mooresville, NC

Sr Data Engineer

Responsibilities:

  • Worked closely with data scientists for building predictive models using PySpark.
  • Expertise in Architectural blueprints and detailed documentation. Create a bill of materials, including required cloud services and tools.
  • Extensive experience with Azure Delta Lake, Azure Databricks, Azure Data Factory, Azure HD Insights, and Azure functions.
  • Developed ETL pipelines using different Azure Data Factory activities and Databricks. The transformations include lookup, datacasting, data cleansing, data transform, and aggregations.
  • Flattening and transforming huge amounts of nested data in parquet and delta forms using Spark SQL and the newest join optimization methods, then loading them into Hive, Delta Lake, and Snowflake tables.
  • Designed a prototype for a product recommendation system utilizing python machine learning libraries and the Azure SDK, which included ADLS, Azure Identity, and Azure Key Vault.
  • Using the Azure SDK, automated the testing of data pipelines and implemented a data quality check framework in Python.
  • Experienced in Gitlab CI and Jenkins for CI and End-to-End automation for all build and CD.
  • Implemented automated alerting and monitoring systems on top of our applications, which are built using tools like Grafana and Kibana. So, we would receive automated alerts in case of any production failures to report any on-call rotation person in the team.

Environment: Azure Databricks, Azure Datalake Gen 1, Azure data factory, Databricks Delta Lake, Snowflake, Spark, HIVE, Kafka, JMS, Oracle, Teradata, Python, Tableau, PowerBi

Confidential, Overland Park, KS

Data Engineer

Responsibilities:

  • Worked on extraction and aggregation of data from various data sources such as SQL Server, AWS Server, MS Excel for analysis.
  • Involved in Design, Development, and Support phases of Software Development Life Cycle (SDLC)
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization, and performed Gap Analysis.
  • Developed ETL jobs for extracting, cleaning, transforming, and loading the data from various sources.
  • Performed in-depth analysis of data and prepared daily reports by using SQL, MS Excel, SharePoint.
  • Created multiple SQL scripts, stored procedures, functions to extract and process the data from data sources and transform them to bring it into the right state.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded clicks stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Reviewed the functional, design, source code, and test specifications. Author for Functional, Design, and Test Specifications.
  • Prepared the Installation, Customer guide, and Configuration document which were delivered to the customer along with the product.
  • Ability to communicate strategies and processes around data modeling and architecture to cross- functional groups and senior levels.
  • Ability to influence multiple levels on highly technical issues and challenges.
  • Demonstrated understanding of Business Intelligence Tool Development, Data Modeling, and Teradata database design.
  • Expertise in analyzing, designing, and building business intelligence solutions using Informatica, and databases technologies like Teradata, DB2.

Environment: Hadoop, Spark, Hive, MapReduce, Sqoop, Java, Pig, SQL Server, Horton works, HBase, ETL, Unix, Teradata, Tableau Desktop 10.1.

Confidential

Big Data Developer

Responsibilities:

  • Developed Hive queries as per required analytics for the report generation
  • Involved in developing the Pig scripts to process the data coming from different sources
  • Worked on data cleaning using pig scripts and storing in HDFS.
  • Worked on PIG user-defined functions (UDF) using java language for external functions
  • Developed data requirements, performed database queries to identify test data, and create data procedures with expected results.
  • Planned, coordinated, and managed internal process documentation and presentations, describing the Process Improvement identified along with the Workflow, Diagrams associated with the Process Flow.
  • Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
  • Moving data to the HDFS framework using Sqoop.
  • Expertise in HIVE optimization techniques like partition and bucketing on the data
  • Expertise on PIG Joins to handle the data in different data sets.

Environment: Hadoop, Hive, Pig, Sqoop, HBase, MapReduce, Java, Python

We'd love your feedback!