We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Having 10+ years of Professional experience as a Data Engineer with expertise in Python, Azure, Spark, Hadoop Ecosystem, and cloud services etc.
  • Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server, and MySQL databases.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating, and moving data from various sources using Apache Flume, Kafka and Power BI.
  • Involved in development of roadmaps and deliverables to advance the migration of existing solutions on - premises systems/applications to Azure cloud.
  • Experience with Azure transformation projects and implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Data warehouse environment
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake(ADLS), Azure Data Factory(ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service(AAS), Azure Blob Storage, Azure Search, Azure App Service, Azure data Platform Services.
  • Extensive knowledge and experience in dealing with Relational Database Management Systems Normalization, Stored Procedures, Constraints, Joins, Indexes, Data Import/Export, Triggers.
  • Expert at data transformations like lookups, Derived Column, Conditional Splits, Sort, Data Conversation, Multicast and Derived columns, Union All, Merge Joins, Merge, Fuzzy Lookup, Fuzzy Grouping, Pivot, Un - pivot and SCD to load data in SQL SERVER destination.
  • Distributions - Cloudera, Amazon EMR, Azure HDInsight, and Hortonworks.
  • Experience developing iterative algorithms using Spark Streaming in Scala and Python to builds near real-time dashboards.
  • Experienced in loading data to Hive partitions and created buckets in Hive and developed Map Reduce jobs to automate transfer the data from H Base.
  • Expertise in working with AWS cloud services like EMR, S3, Redshift, EMR, Lambda, Dynamo DB, RDS, SNS, SQS, Glue, Data Pipeline, Athena for big data development.
  • Worked with various file formats such as CSV, JSON, XML, ORC, Avro, and Parquet file formats.
  • Worked on data processing and transformations and actions in spark by using Python (Spark) language.
  • Expertise in writing DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS.
  • Acquired experience in Spark scripts in Python, Scala, and SQL for advancement in development and examination through analysis.
  • Involved in all the phases of Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment, and Support) and Agile methodologies.
  • Strong understanding of Data Modeling and ETL process in data warehouse environment such as star schema, snowflake schema.
  • Strong working knowledge across the technology stack including ETL, data analysis, data cleansing, data matching, data quality, audit, and design.
  • Experienced working on Continuous Integration & build tools such as Jenkins and GIT, SVN for version control.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.

TECHNICAL SKILLS

Hadoop Eco System: Spark, Hive, Sqoop, Oozie, pig.

Azure Cloud Platform: ADv2,BLOB,ADLS,Azure SQL DB,SQL server, Azure Synapse, Azure Analytics Services, Data bricks, Mapping Dataflow(MDF),Azure data lake(Gen1/Gen2),Azure Cosmos DB, Azure, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning, App service, Logic apps, Event Grid, Services, Logic apps, Event Grid, Service Bus, Azure Devops, GIT Repository Management, ARM Templates.

Programming Language: Python, Scala, R,C,C++,Java, Shell Scripting.

Streaming Framework: Kinesis, Kafka, Flume

Platform: Linux, Windows and OS X

Tools: R Studio, Pycharm, Jupiter, Notebook, Intelli J, Eclipse, Net beans

Databases and Query Language: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos, No SQL,DB, Teradata, Vertica, RDBMS, My SQL. Oracle, PostgreSQL, Microsoft SQL Server.

PROFESSIONAL EXPERIENCE

Azure Data Engineer

Confidential - Dallas, TX

Responsibilities:

  • Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Used Liquid base for database migration and JDBC for database connections.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Data bricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Completed online data transfer from AWS S3 to Azure Blob by using Azure Data Factory.
  • Used Azure Migrate to get started migrating your AWS EC2 instances over to Azure.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Migration of on-premises data (SQL Server / MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).
  • Exposures with Azure Active Directory compatibility. Extensive experience in deployment, migration, patching and troubleshooting of windows 2008 and 2012 R2 Domain Controllers in Active Directory.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • To meet specific business requirements wrote UDF's in Scala and pyspark.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Azure Cloud Engineer

Confidential - Los Angeles, CA

Responsibilities:

  • Worked extensively on Data bricks Pyspark SQLs Pipelines, Linux Shell scripting.
  • Helping other teams such as Ops team to configure Data Lake for their data processing.
  • Implemented the Spark Best practices to efficiently process data to meet ETAs by utilizing features like partitioning, resource tuning, memory management and Check pointing features.
  • Worked on multiple data formats like Parquet, Delta, JSON and XML etc.
  • Extensively used AWS services S3 for storing data and EMR for resource jobs.
  • Resolved issues related to semantic layer or reporting layer.
  • Expertise in SQL Queries for cross verification of data.
  • Developed the Teradata Stored Procedures to load data into Incremental/Staging tables and then move data from staging into Base tables.
  • Implemented solution which is feasible in Data Lake to have minimal usage of
  • Teradata connections for Product & Pricing Tool.
  • Converted TD procedures to Pyspark & check for data consistency in Data Lake.
  • Reviewed the Pyspark SQL for missing joins & join constraints, data format issues, miss-matched aliases, casting errors.
  • Used the Scala /Python Programming to build the UDFs and supporting utilities for the standardizing the data pipelines.
  • Responsible for Design, Data Mapping Analysis, Mapping rules.
  • Performing the weekly and monthly activities specific to business-critical deliverables.
  • Analyzing and resolving the data quality issues.
  • Distinctive communications skills and ability to breach the gap between IT and
  • Business users as well as present/design analytical and technical content in an easy to understand format.
  • Analyze third party systems, interaction with users for the new features of the interface, integration, automation with client business solutions.
  • Lead, Innovate and drive next generation Teradata, AWS Cloud functions to identify, define and help implement Innovative ideas, updating the project team and stakeholders daily
  • Lead Project requirement meetings, Development and Testing strategies for the
  • Teradata, AWS Cloud, TIDAL Areas.
  • Act as a subject matter expert on Teradata, AWS Cloud best practices and improving operational efficiency
  • Identifying areas of improvement and creating value-addition to project.
  • Ensuring the deliverables is adhered to standard quality norms.
  • Analyzing and resolving the data quality issues.
  • Distinctive communications skills and ability to breach the gap between IT and
  • Business users as well as present/design analytical and technical content in an easy to understand format

Azure Data Engineer

Confidential - Kansas City

Responsibilities:

  • Worked with data transfer from on-premises SQL servers to cloud databases (Azure
  • Synapse Analytics (DW) & Azure SQL DB).
  • Created Pipelines that were built in Azure Data Factory using Linked
  • Services/Datasets/Pipeline/ to extract, transform, & load data from a variety of sources including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, & reverse.
  • Created CI-CD Pipelines using Azure DevOps.
  • Created infrastructure using ARM templets & automated with Azure DevOps pipelines.
  • Integrated data storage options with Spark, notably with Azure Data Lake Storage and Blob storage.
  • Ingestion of data into one or more Azure Services (Azure Data Lake, Azure
  • Storage, Azure SQL, Azure DW) & processing of data in Azure Data bricks.
  • Worked directly with the Big Data Architecture Team, which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine the optimal way to aggregate & report on it.
  • Developed simple to complex Map Reduce Jobs using Hive to cleanse & load downstream data
  • Created partitioned tables in Hive & Managed & reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data & writing Hive queries, which will run internally in Map Reduce way.
  • Used Hive to analyze the partition & bucket data & compute various metrics for reporting.
  • Load & transform large sets of structured, semi-structured & unstructured data & manage data coming from different sources.
  • Parsed high-level design specifications to simple ETL coding & mapping standards.
  • Designed & customized data models for the Data warehouse supporting data from multiple sources in real-time
  • Involved in building the ETL architecture & Source to Target mapping to load data into the Data warehouse.
  • Extracted the data from the flat files & other RDBMS databases into the staging area & populated it in the Data warehouse.
  • Used various transformations like Filter, Expression, Sequence Generator, Update
  • Strategy, Joiner, Stored Procedure, & Union to develop robust mappings in the Informatics Designer.
  • Developed mapping parameters & variables to support SQL override.
  • Created applets to use in different mappings.
  • Developed mappings to load into staging tables & then to Dimensions & Facts.
  • Used existing ETL standards to develop these mappings.
  • Worked on different tasks in Workflows like sessions, events raise, event wait, decision, e-mail, comm&, work lets, Assignment, Timer & scheduling of the workflow.
  • Created sessions, configured workflows to extract data from various sources, transformed data, & loaded it into the data warehouse.

Data Analyst

Confidential - MT

Responsibilities:

  • Gathering business requirements from the business users and translate Business data needs into practical solutions.
  • Develop new insights and analyses that inform decisions.
  • Understand and consume data from dimensional models.
  • Work on integrating multiple systems and databases to link and mash up distinctive data sources to discover new insights.
  • Apply statistical methods for analytics projects in order to support business decisions.
  • Prototype and deploy learning models using a R or Python.
  • Work with Business Analysts in gathering the requirements, business analysis, writing technical.
  • Co-ordinate with internal teams to ensure timely delivery of business solutions.
  • Develop and implement innovative AI and machine learning tools that will be used to meet the business requirements.

System Analyst.

Confidential - Denver, CO

Responsibilities:

  • Extract Transform and Load data from Sources system to Data Warehouse using a combination of SSIS, T-SQL, Spark SQL.
  • Ability to apply Data Frame API to complete Data Manipulation within spark session.
  • Created Data Quality Scripts to compare data built from spark data frame API.
  • Design and develop ETL Integration patterns using python on spark.
  • Analyzed SQL scripts and design it by using pySpark SQL for faster performance.
  • Engage with business users to gather requirements, design visualizations and provide training to use self-service BI tools

We'd love your feedback!