We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

2.00/5 (Submit Your Rating)

Pittsburgh, PA

SUMMARY

  • Overall 8 years of IT experience in Analysis, Design, Development, in that 4+years in Big Data technologies like Spark, Map reduce, Hive, Yarn and HDFS including programming languages like Java, Scala and Python.
  • 3+years of experience in Data warehouse / ETL Developer role.
  • Strong experience building data pipelines and performing large - scale data transformations.
  • In-Depth knowledge in working with Distributed Computing Systems and parallel processing techniques to efficiently deal with Big Data.
  • Firm understanding of Hadoop architecture and various components including HDFS, Yarn, Map reduce, Hive, Pig, HBase, Kafka, Oozie etc.,
  • Strong experience building Spark applications using pyspark and python as programming language.
  • Good experience troubleshooting and fine-tuning long running spark applications.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working with structured data using HiveQL, and optimizing Hive queries.
  • Strong experience using Spark RDD Api, Spark Data frame/Dataset API, Spark-SQL and Spark ML frameworks for building end to end data pipelines.
  • Good experience working with real time streaming pipelines using Kafka and Spark-Streaming.
  • Strong experience working with Hive for performing various data analysis.
  • Detailed exposure with various hive concepts like Partitioning, Bucketing, Join optimizations, Ser-De’s, built-in UDF’s and custom UDF’s.
  • Good experience in automating end to end data pipelines using Oozie workflow orchestrator.
  • Good experience working with Cloudera, Hortonworks and AWS big data services.
  • Strong experience using and integrating various AWS cloud services like S3, EMR, Glue Metastore, Athena, and Redshift into the data pipelines.
  • Strong experience of leading multiple Azure Big Data and Data transformation implementations in various domains.
  • Worked on Docker based containers for using Airflow.
  • Expertise in configuring and installation of SQL, SQL advanced Server on OLTP to OLAP systems on from high end to low-end environment.
  • Strong experience in performance tuning & index maintenance.
  • Detailed exposure on Azure tools such as Azure Data Lake, Azure Data Bricks, Azure Data Factory, HDInsight, Azure SQL Server, and Azure DevOps.
  • Experience in analyzing, designing, and developing ETL Strategies and processes, writing ETL specifications.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developing and migrating on-premises databases to Azure Data Lake stores using Azure Data Factory.
  • Excellent understanding of NOSQL databases like HBASE, and MongoDB.
  • Proficient knowledge and hand on experience in writing shell scripts in Linux.
  • Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python/Java technologies.
  • Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimensions modeling for OLAP.
  • Adequate knowledge and working experience in Agile and Waterfall Methodologies.
  • Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
  • Done POC on newly adopted technologies like Apache Airflow and Snowflake and GitLab.
  • Have good interpersonal, communication skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, SQL, YARN, PIG Latin, Map Reduce, Hive, Sqoop, Spark, Yarn, Zookeeper, Oozie, Kafka, Storm, Flume

Programming Languages: Python, Pyspark, Java, Shell Scripting

Big Data Platforms: Hortonworks, Cloudera

Cloud Platform: Azure (ADF, Azure Analytics, HDInsight’s, ADL, Synapse), AWS (S3, Redshift, Glue, EMR, Lambda, Athena)

Operating Systems: Linux, Windows, UNIX

Databases: MySQL, HBase, MongoDB, Snowflake

Development Methods: Agile/Scrum, Waterfall

IDE’s: PyCharm, IntelliJ, Ambari

Data Visualization: Tableau, BO Reports.

PROFESSIONAL EXPERIENCE

Confidential, Pittsburgh, PA

Sr. Azure Data Engineer

Responsibilities:

  • Contributed to the development of Pyspark Data Frames in Azure Data bricks to read data from Data Lake or Blob storage and utilize Spark Sql context for transformation.
  • Experience in Creating, developing, and deploying high-performance ETL pipelines with Pyspark and Azure Data Factory.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Worked on an Azure copy to load data from an on-premises SQL server to an Azure SQL Data warehouse.
  • Worked on redesigning the existing architecture and implementing it on Azure SQL.
  • Experience with Azure SQL database configuration and tuning automation, vulnerability assessment, auditing, and threat detection.
  • Integration of data storage solutions in spark - especially with Azure Data Lake storage and Blob snowflake storage.
  • Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, and Azure SQL Server.
  • Experience in developing Spark applications using Spark-SQL inData bricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
  • Implemented and Developing Hive Bucketing and Partitioning.
  • Implemented Kafka, spark structured streaming for real time data ingestion.
  • Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Data Lake Analytics, HDInsight’s, Hive, and Sqoop.
  • Strong experience in Azure Cloud platforms like
  • Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Azure Monitoring, Key Vault, Function app and Event Hubs.
  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Improving the performance of Hive and Spark tasks.
  • Knowledge with Kimball data modeling and dimensional modeling techniques.
  • Worked on cloud point to identify the best cloud vendor based on a set of strict success criteria.
  • Used Hive queries to analyze huge data sets of structured, unstructured, and semi-structured data.
  • Created Hive scripts from Teradata SQL scripts for data processing on Hadoop.
  • Developed Hive tables to hold processed findings, as well as Hive scripts to convert and aggregate heterogeneous data.
  • Good experience in tracking and logging end to end software application build using Azure DevOps.
  • UsedTerraformscript for deploying the application’s for higher environments.
  • Involved in variousSDLCLife cycle phases like development, deployment, testing, documentation, implementation & maintenance of application software.
  • Experience in transporting and processing real-time stream data usingKafka.

Environment: Azure, Data Lake, data factory, Event hubs, Kafka, Function app, Key vault, Azure SQL, Azure Monitoring, Azure DevOps.

Confidential, Pleasonton, CA

AWS Data Engineer

Responsibilities:

  • Designed robust, reusable and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and semi structured batch and real time data streaming data.
  • Develop framework for converting existing Power Center mappings and to Pyspark (Python and Spark) Jobs.
  • Create Pyspark frame to bring data from DB2 to Amazon S3.
  • Applied efficient and scalable data transformations on the ingested data using Spark framework.
  • Gained good knowledge in troubleshooting and performance tuning Spark applications and Hive scripts to achieve optimal performance.
  • Developed various custom UDF’s in spark for performing transformations on date fields, complex string columns and encrypting PI fields etc.,
  • Written complex hive scripts for performing various data analysis and creating various reports requested by business stake holders.
  • Capable of using AWS utilities such as EMR, S3 and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS.
  • Used Oozie and Oozie Coordinators for automating and scheduling our data pipelines.
  • Used AWS Athena extensively to ingest structured data from S3 into other systems such as Redshift or to produce reports.
  • The Spark-Streaming APIs were used to conduct on-the-fly transformations and actions for creating the common learner data model, which receives data from Kinesis in near real time.
  • Implemented data ingestion from various source systems using Sqoop and Pyspark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis.
  • Hive As the primary query engine of EMR, we have built external table schemas for the data being processed.
  • AWS RDS (Relational database services) was created to serve as a Hive Meta store, and it was possible to integrate the Meta data from 20 EMR clusters into a single RDS, avoiding data loss even if the EMR was terminated.
  • Worked extensively on migrating our existing on-prem data pipelines to AWS cloud for better scalability and infra structure maintenance.
  • Worked extensively in automating creation/termination of EMR clusters as part of starting the data pipelines.
  • Worked extensively on migrating/rewriting existing Oozie jobs to AWS simple workflow.
  • Loaded the processed data into Redshift tables for allowing downstream ETL and Reporting teams to consume the processed data.
  • Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs.

Environment: AWS Cloud, Spark, Kafka, Hive, Yarn, HBase, Jenkins, Docker, Tableau, Splunk.

Confidential, Dublin, Ohio

Hadoop Engineer

Responsibilities:

  • Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Worked on different files like csv, txt, fixed width to load data from various sources to raw tables.
  • Conducted data model reviews with team members and captured technical metadata through modelling tools.
  • Implemented ETL process wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
  • Transferred data from HDFS to Relational Database Systems using Sqoop for Business Intelligence, visualization, and user report generation.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Maps were used on many occasions like Reducing the number of tasks in Pig and Hive for data cleansing and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Created Hive Tables, used Sqoop to load claims data from Oracle, and then put the processed data into the target database.
  • Experience in loading logs from multiple sources into HDFS using Flume.
  • Worked with NoSQL databases like HBase in creating HBase tables to store large sets of semi-structured data coming from various data sources.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive tables.
  • Developed complex Map reduce jobs for performing efficient data transformations.
  • Data cleaning, pre-processing and modelling using Java Map reduce.
  • Strong Experience in writing SQL queries.
  • Responsible for triggering the jobs using the Control-M.

Environment: Java, SQL, ETL, Hadoop, HDFS, HBase, MySQL, Web Services, Shell Script, Control-M.

Confidential

Data Warehouse Developer / ETL Developer

Responsibilities:

  • Created new database objects like Procedures, Functions, Packages, Triggers, Indexes & Views using T-SQL in Development and Production environment for SQL Server 2008R2.
  • Developed Database Triggers to enforce Data integrity and Referential Integrity.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs.
  • Defined relationship between tables and enforced referential integrity.
  • Extensively used various SSIS objects such as Control Flow Components, Dataflow Components, Connection managers, Logging, Configuration Files etc
  • Established connectivity to database and monitored systems performances.
  • Performance tuning and monitored both T-SQL and PL/SQL blocks.
  • Used SQL Profiler and Query Analyzer to optimize DTS package queries and stored procedures.
  • Wrote T-SQL procedures to generate DML scripts that modified database objects dynamically based on user inputs.
  • Created Stored Procedures to transform the Data & worked extensively in T-SQL for various needs of the transformations while loading the data.
  • Participated in performance tuning using indexing (Cluster Index, Non-Cluster index) tables.

Environment: SQL Server 2008R2, SSIS, Windows server, SQL Profiler, SQL Query Analyzer, T-SQL.

We'd love your feedback!