Azure Data Engineer Resume
5.00/5 (Submit Your Rating)
Ridgefield Park, NJ
SUMMARY
- Senior Data Engineer with strong experience in Microsoft technologies including SQL Server, Azure Data Factory, Databricks and SSIS
- Experience in architecture and implementation of OLTP/OLAP systems and ETL on Microsoft Azure platform.
- Involved in implementation of medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
- Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments
- Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star - Schema modeling, Snowflake Schema modeling, Fact and Dimension tables.
- Experience in Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data
- Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
- Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
- Well-versed with Azure authentication mechanisms such as Service principal, Managed Identity, Key vaults.
- Deep understanding of Azure data storage technologies such as Azure Blob Storage, Azure Data Lake Gen2, Synapse and Azure SQL.
- Expert on writing the Dynamic content expressions in ADF, which enables to pass dynamic values to parameters at pipeline, dataset, and activity level.
- Experience in design, develop and implementation of large-scale, high-volume, high-performance data lake and data warehouse solution.
- Excellent experience in query optimization and tuning of the queries in Azure SQL DB, Synapse or RDBM systems.
- Expertise in developing complex SQL queries, Stored Procedures, Joining multiple tables, views and CTEs
- Familiar with data architecture including data ingestion pipeline, data mining, data management, analytics, data governance.
TECHNICAL SKILLS
Database: SQL Server, MySQL, MongoDB, Azure SQL
ETL: SSIS, Azure Data Factory, Databricks, Apache Kafka, Apache Flink
Data warehousing: Azure Data Lake, Delta Lake, Synapse Analytics
Programming: T-SQL, Python, Scala, Javascript, NodeJs
DevOps: Docker, Openshift, Kubernetes
PROFESSIONAL EXPERIENCE
Azure Data Engineer
Confidential - Ridgefield Park, NJ
Responsibilities:
- Responsible for developing ETL pipelines to meet business use cases by using data flows, Azure Data Factory (ADF), Data Lake and Azure Datawarehouse
- Created Pipelines in AzureDataFactoryusing Linked Services/Datasets/Pipelines to Extract, Transform and loaddatafrom different sources like Azure SQL, Blob storage, Azure SQLDatawarehouse
- Used complex data transformations and of data with the help of Data flows and Databricks
- Created Azure data factory Pipelines to pull data from Blob Storage account to Azure SQL Data warehouse, Azure SQL Data base and Azure Data Lake storage as a full/Incremental load based on frequency at which the data arrives in to corresponding tables/folders.
- Perform data extraction, aggregation, and quality checks from multiple sources
- Involved in Designing the Data Warehouse and creating Fact and Dimension tables with Star Schema and Snowflake Schema.
- Implemented Normalization rules in database development and maintaining Referential Integrity by using Primary Keys and foreign keys
- Utilized various activities present in ADF pipeline like Copy Data, execute pipeline, Execute Stored Procedure, Get Meta data, Lookup, Web for sending Email, For Each Iterator, If Condition etc.
- Working with different data storage options including Azure Blob, ADLS Gen-1, Gen-2.
- Build complex ETL jobs that transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database
- Work with various file formats: flat-file TXT & CSV; parquet & other compressed formats
- Utilize Databricks Delta Lake storage layer to create versioned Apache Parquet (delta) files with transaction log and audit history
Data Engineer
Confidential - Denver, CO
Responsibilities:
- Worked on designing and implementing P&L Attribution platform for equity option pricing and risk analytics using Azure Data Lake, Azure Data Factory, Azure Databricks, Delta Lake, Azure Logic Apps
- Implemented ETL/ELT based solutions to integrate various data sources and created a unified/enterprise data model for analytics and reporting.
- Built an ETL Framework specifically in Microsoft Azure environments using Azure data factory pipelines, stored procedures, Azure functions, APIs, etc.
- Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data Factory V2.
- Designed Data Pipelines for ETL Jobs using LOOKUP, FOREACH, COPY, GET METADATA ACTIVITIES and load data into the data lake.
- Extensive loading of Continuous json data from different source system using EventHub into various downstream systems using stream analytics and Apache spark structured streaming (Databricks).
- Develop Azure Databricks notebooks to apply the business transformations and perform data cleansing operations using PySpark and SQL.
- Created Databricks notebooks with delta format tables and implemented lake house architecture.
- Responsible for development of Database Objects, Tables, Stored Procedures, Indexes, Triggers
- Automating jobs using Scheduled, Event based, Tumbling window triggers in ADF.
- Working with different file formats like json, csv, avro, parquet using Databricks and Data Factory.
- Using Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
- Create and setup self-hosted integration runtime on virtual machines to access private networks.
- Creating and provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installing the required libraries for the clusters.
- Provisioned, configured deployed and administered Azure Synapse Pool with all accessory components such as Azure Key Vault, ADLS Storage and other Azure services
- Designed and developed solutions for ingestion, transformation, and load of complex datasets in xml, json, parquet and csv formats using data factory and Azure Databricks
- Worked with Data Platform teams in implementing schema evolution and data auditing solutions to support robust data engineering processes.
Microsoft BI Developer
Confidential - Chicago, IL
Responsibilities:
- Responsible for development of SQL Objects, Tables, Stored Procedures, Indexes, Triggers
- Created SSIS packages to load data into Data Warehouse using Various SSIS Tasks like Execute SQL Task, bulk insert task, data flow task, file system task, send mail task, active script task, xml task and various transformations
- Extensively used different types of transformations such as lookup, slowly changing dimension (SCD), multi-cast, merge, OLE DB command, and derived column to migrate SSIS packages from one database to another database.
- Involved in queries optimization using tools like SQL Profiler, Tuning Advisor, Execution plan, and statistics IO
- Developed complex SQL queries and performed optimization of databases and tuning of long running SQL Queries by using SQL Server Profiler and SQL Tuning Advisor.
- Created Event Handlers for various runtime events and created Custom Log Provider in SQL Server to log those events for audit purposes
- Implemented Change Data Capture (CDC) to perform incremental data extraction to reduce time for importing data into the CDW
- Extensively used SSIS parallelism and multithreading features to increase performance and decrease ETL duration
- Validating and cleaning source data using different SSIS data transformations such as Script task, Conditional Split, lookup, slowly changing dimension, script component, Data conversion, derived columns, Merge join, etc.
- Created packages in SSIS with error handling capability, and using different types of data transformations like conditional split, cache, for each loop, multi cast, derived column, merge join, Script component, slowly changing dimension, Lookups, Mail tasks, etc
- Created custom SSIS ETL Framework for loading Data Warehouse with restart ability logic and using DQS and MDM utilities for applying business rules
- Implemented package configurations in SSIS packages for cross environment deployment
- Perform a Proof of Concept to evaluate Azure Technologies which can be utilize for data Migration from Legacy system to Cloud
SQL Developer
Confidential - Chicago, IL
Responsibilities:
- Designed and implemented Operational Data Store (ODS) in SQL Server to maintain daily transactions, positions, and instrument data
- Created SSIS extracts in various formats to feed accounting data to downstream systems such as reporting reconciliation, performance calculation, AML, fee billing, and position benchmarking etc.
- Writing custom RSL reports to extract data from Geneva and loading it into the warehouse
- Implemented handling of Type 2 dimensions and inferred members using Slowly Changing Dimension
- Troubleshooting day to day production issues like data discrepancies, reporting inaccuracies, missing prices, performance bottlenecks etc., and resolving them in a timely fashion
- Applying and testing patches to Geneva Application Server, Core Server, and Workflow Manager