We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

FL

SUMMARY

  • Over all around 6 years of experience in data engineering and data analyst roles worked in various big data and cloud platforms like Azure. Experience in working on multiple programming languages and technologies like Java, Scala, and Python. Experience in writing SQL queries.
  • Experience in using MS Azure PaaS services such as SQL Server, HDInsight, and service bus.
  • Experience in design and implementation plans for hosting complex application workloads on MS Azure.
  • Experience in developing complex Data Analysis Language (DAX).
  • Experience in building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
  • Experience in Azure API Management, Security, Cloud - to-Cloud Integration.
  • Experience with Data flow diagrams, Data dictionary, Database normalization theory techniques, Entity relation modeling and design techniques.
  • Experience in implementing data pipelines using Azure DataFactory
  • In-depth knowledge of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming
  • Extensively Worked as an Azure Data Engineer using Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NO SQL DB, Azure HDInsight, Big Data Technologies (Hadoop and Apache Spark) and Data bricks
  • Developed information processes for data acquisition, data transformation, data migration, data verification, data modeling, and data mining.
  • Experience in implementing in ETL and ELT solutions using large data sets.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and using fast loaders and connectors Experience.
  • Hands on experience in creating pipelines in Azure Data Factory V2 using activities like Move &Transform, Copy, filter, for each, Get Metadata, Lookup, Data bricks etc.
  • Expertise in querying and testing RDBMS such as Teradata, Oracle and SQL Server using SQL for data modeling and data integrity.
  • Used various complex data types like Structs, Arrays, Maps, Lists in Pyspark to handle complex data and flattened them using Split and explode

TECHNICAL SKILLS

Azure Cloud Platform: Azure Data Factory v2, Azure Blob Storage, Azure DataLake Gen 1 & Gen 2, Azure SQL DB, SQL server, Logic Apps, Azure Synapse, Azure Analytic Services, Data bricks, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Key Vault, Azure App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, ARM Templates.

Databases: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos DB, Teradata, Oracle, MySQL, Microsoft

Programming Languages: Python, PySpark, T-SQL, LINUX Shell Scripting, AZURE PowerShell

ETL Tools: Teradata SQL Assistant, TPT, BTEQ, Fast Load, Multi Load, Fast Export, T Pump, Informatica Power Centre 9.x/8.6/8.5/8.1

Big data Technologies: Hadoop, Hive, HDFS, Apache Kafka, Apache Spark

Data Modeling: Erwin, Visio

PROFESSIONAL EXPERIENCE

Confidential, FL

Azure Data Engineer

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Creating pipelines, data flows and complex data transformations and manipulations using Azure Data Factory (ADF) and PySpark with Databricks.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • REST APIs to retrieve analytics data from different data feeds
  • Developed HQL, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipelineto perform ETL on data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed Spark applications usingPySparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring, troubleshooting of the Spark data bricks cluster.
  • Design, develop and maintain SSIS packages to consume web services data for Analysis and Reporting of APS Decision Making System.
  • Developed data pulls from Cosmos using Scope Scripts.
  • Worked with the Microsoft COSMOS DB to generate distributed systems technology. All the datasets.
  • Created SQL scripts to migrate data from SQL server to Cosmos DB.
  • Extensively involved in writing T-SQL Stored Procedures, Functions to convert complex business logic into backend.
  • Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop and its ecosystem.
  • Extensive role in fixing performance issues by tuning slow running queries and run-away queries using different methods, by evaluating statics, joins, indexes and code changes.
  • Created alarms, alerts, notifications for Spark Jobs to email and slack group message job status and log in Cloud Watch.
  • Designed, Developed, Tested, Published and maintained PowerBI Functional reports, Dashboards for the management teams to make healthy decisions.
  • Created maintained SQL Agent Jobs for automating and scheduling ETL process and Database maintenance tasks.
  • Develop Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure data lake storage.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and Load data from different sources like Azure SQL, Blob storage, Azure SQLData warehouse, write-back tool and backwards.
  • Created Linked Services, DataSets and Self hosted Integration Run times for On-Prem servers and maintained three environments Dev, UAT and Prod.
  • Some of the SSIS Packagesare Deployed into Cloud from On-Premises with “Lift and Shift” after making Minimum Configuration Changes.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing.
  • Transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.

Environment: Microsoft SQL Server 2019, Azure Data Factory, Azure Data Bricks, Azure Synapse Analytics, Azure,SQL Server Data Tools 2019 (SSDT),Visual Studio 2018,Git, Power BI, Azure Devops, VSO.

Confidential

Azure Data Engineer

Responsibilities:

  • Used Azure Data Factory as an orchestration tool for integrating data from upstream to downstream systems.
  • Used Pandas, Opencv, Numpy in Python for developing data pipelines.
  • Perform Data Cleaning, features scaling, features engineering using Pandas and Numpy packages in python.
  • Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
  • Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Developed data processing applications in Scala using SparkRDD as well as Dataframes using SparkSQL APIs.
  • Used pandas UDF like building the array contains, distinct, flatten, map, sort, split and overlaps for filtering the data.
  • Worked in Planning, Defining and Designing database based on business requirements and provided documentation.
  • Extracted the file from FIS and loaded into Secure server.
  • Performed ETL operation using SSIS and loaded the data into Secure DB.
  • Created design for Hubs and Satellites for the credit card data to accommodate the business requirements.
  • Maintained security by creating the Guid’s in place of credit card numbers and responsible for the credit card system.
  • Used SHA2 256 Algorithm to encrypt credit card numbers along with Social Security Number (SSI)
  • Used Data vault 2.0 pattern to load the data into Hubs and Satellites along with Links.
  • Designed a custom tool to load the data EDM -Engine.
  • Design and development of commercial data service applications.
  • Developed multiple Hive external table with partitions to load staging datadynamicallyand analyzing data using hive queries
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL and Big Data technologies. Analyzing the requirements to develop the framework.
  • Loaded & transformed large sets of structured, semi and unstructured data using Hadoop/Big Data concepts.
  • Developed python Spark streaming scripts to load raw files and corresponding.
  • Implemented Pyspark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. Elaborated Python Scripts to fetch/get S3 files using Boto3 module.
  • Built scripts to load Pyspark processed files into Redshift Db and used diverse Pyspark logics.

Environment: Azure Data Factory (ADF v2), Spark (Python/Scala), Hive, Docker Containers, MS-Azure, Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, UNIX Shell Scripting, AZURE PowerShell, Data bricks

Confidential

Data Analyst

Responsibilities:

  • Data extraction from Fleet Monitor using SQL queries. Multiple selection filters are applied to extract the desired data.
  • Data cleaning and transformation using advanced Excel
  • Data analysis using Python-identifying relations between different parameters, identifying trends in the data, plotting the data to understand the correlation between variables, and choosing a suitable variable to develop alert logic.
  • Creating visualizations, reports and dashboards using Tableau to share insights with the customer.
  • Responsible for developing alerts using Fleet Monitor/Orbita Enterprise based on the alerting requirement from the client.
  • Preparing ORD (Operational Release Document) for alerts created in Fleet Monitor and ORBITA that details the data extraction, manipulation, and analysis along with compelling visuals.
  • Predicting the worst-performing alerts on different fleets by analyzing the historical data and relevant evidence in IBM Maximo.
  • Developed a tool based on Excel VBA that analyzes, cleans, and manipulates the raw dataset (historical train data) and sends out an automated email with the refined final report to all stakeholders.
  • User acceptance testing in Fleet Monitor following any new releases of the website.
  • Conducting peer reviews (Quality Checks) of the ORDs and alerts created by other team members. This process is carried out to meet the documentation standards and to make every release with minimum to zero errors.
  • Lead the training programs for new hires to help them understand the train domain and the process flow in detail.

Environment: Microsoft SQL Server, MS-Office, MS-Visio, SQL, PL/SQL, Windows. Data Analysis, MySQL, Tableau, Python,Microsoft Excel, Pattern & Trend identification, Extract Transform Load (ETL), Predictive Analytics, IBM Maximo, Microsoft SharePoint.

We'd love your feedback!