We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

3.00/5 (Submit Your Rating)

Columbus, IndianA

SUMMARY:

  • Professional qualified Data Engineer wif around 7+ years of experience in development and implementation of Data Warehousing solutions.
  • Hands on experience wif big data technologies like Azure Databricks, Spark, HDInsight, HDFS, HIVE, Azure Functions, Logic Apps, Azure Data Factory, Azure Synapse (DW), Azure Databricks applications.
  • Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data sets.
  • Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark, Scala and Spark SQL.
  • Extensive experience on building ETL ingestion flows and re - useable data pipelines using ADF.
  • Experience working wif Databases and Data warehouses such as Azure SQL DB, Synapse Analytics, Teradata, Oracle, SAP HANA.
  • Proficiency in SQL across several dialects (MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
  • Experience working wif Teradata utilities such as Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Sound noledge on Normalization and De-normalization techniques on OLAP and OLTP systems.
  • Experience in building Azure stream Analytics ingestion spec for data ingestion which halps users to get sub second results in Real Time.
  • Experience on Implementation of Azure log analytics. Developed workbooks in Log Analytics workspace using Kusto Query Language (KQL).
  • Hands on experience on implementing CICD using Azure DevOps.
  • Has noledge on Basic Admin activities related to ADF like providing access to ADLs using service principal, install IR, creating services like ADLS, Logic Apps, Key Vaults etc.
  • Hands-on use of Spark and Scala API to compare teh performance of Spark wif Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good noledge of Data Marts, OLAP, and Data Modelling (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Provided architecture and infrastructure guidance of Snowflake capabilities to accommodate business/technical use cases.
  • Experience in Re-clustering of data and Micro-Partitioning in Snowflake .
  • Migrated enterprise workloads to Snowflake using industry-standard as well as proprietary methodologies.
  • Automated and managed provisioning needs, such as Snowflake storage and compute, Role-Based Access Control model, and permissions.
  • Hands on experience in integrating teh Snowflake and dbt (data build tool) to transform data.
  • Partnered wif Data Science and infrastructure teams on evaluation and feasibility assessment of new systems and technologies.
  • Hands on experience in traditional ML models (K-means, Bayesian Networks, Decision trees, SVM, Regression models, Gaussian process).
  • Experience in using machine learning frameworks such as PyTorch, TensorFlow.
  • Experience in building Automation Regressing Scripts for validation of ETL processes between multiple databases like Oracle, SQL Server, Hive, and MongoDB using Python.
  • Ability to work TEMPeffectively in cross-functional team environments, excellent communication, and interpersonal skills.
  • Has good working experience in Agile/Scrum methodologies, communication wif scrum calls for project analysis and development aspects.

PROFESSIONAL EXPERIENCE:

Senior Data Engineer

Confidential, Columbus, Indiana

Responsibilities:

  • Engineered a re-useable Azure Data Factory based data pipeline infrastructure dat transforms provisioned data to be available for consumption by Azure SQL Data warehouse and Azure SQL DB.
  • Created ADF pipelines to extract data from on premises source systems to azure cloud data lake storage. Extensively worked on copy activities and implemented teh copy behavior’s such as flatten hierarchy, preserve hierarchy and Merge hierarchy. Implemented Error Handling concept through copy activity.
  • Extensively worked on Azure Data Lake Analytics wif teh halp of Azure Data bricks to implement SCD-1, SCD-2 approaches.
  • Developed Spark (Scala) notebooks to transform and partition teh data and organize files in ADLS.
  • Worked on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Deployed and managed on-premise and cloud-based Machine Learning tools.
  • Extend existing ML Frameworks, API connectivity and libraries to meet user needs.
  • Constructed optimized data pipelines to feed Machine Learning models.
  • Collaborated wif ML scientists and data scientists to prepare data for model development, and to deploy models to production.
  • Translated machine learning algorithms into production-level code and improved teh performance of existing machine learning solutions.
  • Exposure to Azure Data Factory activities such as Lookups, Stored procedures, if conditions, for each, Set Variable, Append Variable, Get Metadata, Filter etc.
  • Created Linked Services for multiple source systems (i.e.: Azure SQL Server, ADLS, BLOB, Rest API).
  • Implemented delta logic extractions for various sources wif teh halp of control tables; implemented teh Data Frameworks to handle teh deadlocks, recovery, and logging teh data of pipelines.
  • Configured teh logic apps to handle email notification to teh end users and key shareholders wif teh halp of web services activity; created a dynamic pipeline to handle multiple source extraction to multiple targets; extensively used azure key vaults to configure teh connections in linked services.
  • Configured and implemented teh Azure Data Factory Triggers and scheduled teh Pipelines; monitored teh scheduled Azure Data Factory pipelines and configured teh alerts to get notification of failure pipelines.
  • Created Azure Stream Analytics Jobs to replicate teh real time data to load to Azure SQL Data warehouse.
  • Deployed teh codes to multiple environments wif teh halp of CI/CD process and worked on code defect during teh SIT and UAT testing and provide supports to data loads for testing; Implemented reusable components to reduce manual interventions.
  • Processed teh Structured and semi structured files like JSON, XML using Spark, HDInsight and Databricks environments.
  • Prepared teh data models for Data Science and Machine Learning teams. Worked wif teh teams in setting up teh environment to analyze teh data using Pandas.
  • Worked wif VSTS for teh CI/CD Implementation.
  • Reviewing individual work on ingesting data into azure data lake and provide feedbacks based on architecture, naming conventions, guidelines, and best practices.
  • Implemented End-End logging frameworks for Data factory pipelines.

Environment: Azure Data Factory, Azure Databricks, Azure SQL DB, Azure DW (Synapse Analytics), Polybase, Azure DevOps, Blob, Machine Learning Models, Tensor Flow, ADLS.

Data Engineer

Confidential, Branchburg, NJ

Responsibilities:

  • Work on requirements gathering, analysis and designing of teh systems.
  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull teh data from sources to Kafka clusters.
  • Worked wif spark techniques like refreshing teh table and handling parallelly and modifying teh spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted teh data to BI team for generating reports, after teh processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked wif data science team to build statistical model wif Spark MLLIB and PySpark.
  • Involved in performing importing data from various sources to teh Cassandra cluster using Sqoop.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export teh transformed data to Cassandra as per teh business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS.
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
  • Developed Python script for start a job and end a job smoothly for a UC4 workflow.
  • Migrated data from on-premise to cloud technologies primarily into Snowflake and AWS eco-system.
  • Integrated data platforms to leverage efficiencies and automation wifin teh stack.
  • Overseeing teh implementation of a Snowflake Data Warehouse on AWS.
  • Developed ETL pipelines in and out of data warehouse using Python and SnowSQL.
  • Created Airflow DAGs to schedule teh Ingestions, ETL jobs and various business reports.
  • Developed Oozie workflow for scheduling & orchestrating teh ETL process.
  • Created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
  • Wrote Python scripts to parse XML documents and load teh data in database.
  • Worked extensively on Apache Nifi to build Nifi flows for teh existing Oozie jobs to get teh incremental load, full load and semi structured data and to get data from Rest API into Hadoop and automate all teh Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS.
  • Worked extensively wif importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Developed teh batch scripts to fetch teh data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Used version control tools like GITHUB to share teh code snippet among teh team members.

Environment: Hadoop, HDFS, Hive, Python, Hbase, Nifi, Spark, MYSQL, Oracle 12c, Linux, Hortonworks, Oozie, MapReduce, Sqoop, Shell Scripting, Apache Kafka, Snowflake, Scala, AWS.

Data Engineer

Confidential, Costa Mesa, CA

Responsibilities:

  • Analyzing Functional Specifications Based on Project Requirement.
  • Ingested data from various data sources into Hadoop HDFS/Hive Tables using SQOOP, Flume, Kafka.
  • Extended Hive core functionality by writing custom UDFs using Java.
  • Developing Hive Queries for teh user requirement.
  • Worked on multiple POCs in Implementing Data Lake for Multiple Data Sources ranging from TeamCenter, SAP, Workday, Machine logs.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on MS Sql Server PDW migration for MSBI warehouse.
  • Planning, scheduling and implementing Oracle to MS SQL server migrations for AMAT in house applications and tools.
  • Worked on Solr Search Engine to index incident reports data and developed dash boards in Banana Reporting tool.
  • Integrated Tableau wif Hadoop data source for building dashboard to provide various insights on sales of teh organization.
  • Worked on Spark in building BI reports using Tableau. Tableau was integrated wif Spark using Spark-SQL.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Developed work flows in Live Compare to Analyze SAP Data and Reporting.
  • Worked on Java development and support and tools support for in house applications.
  • Participated in daily scrum meetings and iterative development.
  • Search functionality for searching through millions of files of logistics groups.

Environment: Hadoop, Hive, Sqoop, Spark, Kafka, Scala, MS SQL Server PDW, Java.

Big Data Engineer

Confidential

Responsibilities:

  • Created Pipelines in ADF using Linked Services, Datasets & Pipeline activities to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, Teradata, Oracle DB, SQL Server.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Developed Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming teh data.
  • Developed Spark (Scala) notebooks to transform and partition teh data and organize files in ADLS.
  • Optimized existing algorithms in Hadoop using spark context, spark SQL, Data frames and RDD.
  • Developed teh deployment scripts for Azure Data Factory using ARM Templates and JSON Scripts.
  • Hands-on experience on developing PowerShell Scripts for automation purposes.
  • Created Azure Stream Analytics Jobs to replicate teh real-time data to load to Azure SQL Datawarehouse.
  • Implemented delta logic extractions for various sources wif teh halp of control tables; implemented teh Data Frameworks to handle teh deadlocks, recovery, and logging teh data of pipelines.
  • Deployed teh codes to multiple environments wif teh halp of CI/CD process and worked on code defects during teh SIT and UAT testing and provided support to data loads for testing; Implemented reusable components to reduce manual interventions.
  • Worked in an Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.

Environment: ADF, Azure SQL DB, Azure SQL DW, Spark, Scala, Python, PySpark, Hadoop, Hive, Impala, ETL.

Systems Engineer/ Hadoop Developer

Confidential

Responsibilities:

  • Responsible for design and development of Big Data applications using Cloudera/Hortonworks Hadoop.
  • Coordinated wif business customers to gather business requirements.
  • Migration of huge amounts of data from different databases (i.e., Netezza, Oracle, SQL Server) to Hadoop.
  • Importing and exporting data into HDFS from Teradata and vice versa using Sqoop.
  • Responsible to manage teh data coming from different sources.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Migrated HiveQL queries on structured into Spark SQL to improve performance.
  • Analyzed data using Hadoop components Hive and Pig and created tables in hive for teh end users.
  • Involved in writing Hive queries and pig scripts for data analysis to meet teh business requirements.
  • Designed and created teh data models, Implemented teh data pipelines dat provides teh timely access to large datasets in Hadoop ecosystem
  • Worked on building, testing and deploying teh code to Hadoop clusters on Development and Production Hadoop servers using Cloudera CDH4 Hadoop distribution.
  • Created teh pipelines to import teh data from S3, API’s and other vendor applications using PIG, HIVE, bash scripts and Oozie workflows.
  • Integrated teh Hadoop ecosystem wif teh Power BI to create data visualization applications for various teams
  • Optimized Map Reduce and hive jobs to use HDFS efficiently by using Gzip, LZO, Snappy and ORC compression techniques.

Environment: Hadoop, Hive, Pig, Sqoop, Oozie, Flume, Map Reduce, Spark, Zookeeper, SSMS, Jenkins, Git, HDInsights.

We'd love your feedback!