Azure Data Engineer Resume
Salisbury, NC
SUMMARY
- An experienced professional with 8 years of Information Technology experience in development of software applications, Analysis, Design, Development, Testing and Deployment.
- Experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight, Big Data Technologies (Hadoop and ApacheSpark) and Data bricks.
- Experience in designing and implementation of cloud architecture on Microsoft Azure.
- Excellent knowledge on integrating Azure Data Factory V2 with variety of data sources and processing the data using the pipelines, pipeline parameters, activities, activity parameters, manually/window based/event - based job scheduling.
- Experienced in software development life cycle in all phases.
- Strong development skills with Azure Data Lake Storage(ADLS), Azure Data Factory, Azure Storage Explorer and Azure Databricks.
- Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory(ADF).
- Experience in Developing Spark applications using Spark - SQL in Data bricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics.
- Experienced on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Experience in implementing in ETL and ELT solutions using large data sets.
- Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
- Hands-on experience in developing Logic Apps workflows for performing event-based data movement.
- Experienced in building Tabular Model Cube from Scratch in Databases with different Measures and Dimensions.
- Self-motivated, able to work independently and in teams; can handle multiple projects while paying attention to details; accomplished problem solving and analytical skills.
TECHNICAL SKILLS
Azure Cloud Platform: Azure Data Factory v2, Azure Data Lake Storage Gen2, BLOB Storage, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Azure Cosmos DB, Azure HDInsight, Azure Stream Analytics, Azure Event Hub, Key Vault, Azure App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, ARM Templates
Programming Languages: PySpark, Python, T-SQL, USQL, LINUX Shell Scripting, AZURE PowerShell
Big data Technologies: Apache Spark, Hadoop, HDFS, Hive, Apache Kafka
CI/CD: Jenkins, Azure DevOps
Databases: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos DB, No SQL DB, Microsoft SQL Server, MySQL, Oracle, Teradata
IDE and Tools: SSMS, Maven, SBT, MS-Project, GitHub, Microsoft Visual Studio, JIRA, Tableau, Informatica.
Methodologies: Agile Scrum, Waterfall, SDLC
PROFESSIONAL EXPERIENCE
Confidential, Salisbury, NC
Azure Data Engineer
Responsibilities:
- Developed data engineering processes in azure using components like Azure Data Factory, Azure Data Lake Analytics, HDInsight, DataBricks, etc.
- Developed complex SQL/USQL/PySpark code for Data Engineering pipelines in Azure Data Lake analytics and azure data factory.
- Conducted data analysis (SQL, Excel, Data Discovery, etc.) on legacy systems and new data sources.
- Setting up the CI-CD infrastructure using tools such as Azure DevOps, Artifactory, and to maintain the code in the version control system and tracking the changes in the code.
- Reviewed and validated QA test plans and supported the QA team during test execution.
- Participated in code reviews and ensure that all solutions are aligned to pre-defined architectural specifications.
- Participated in all phases of the project from requirement gathering, analysis to development.
- Work with business users to design, develop, test, and implement business intelligence solutions in the Data & Analytics Platform.
- Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
- Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data Bricks.
- Analyzed the SQL Scripts and designed the solution to implement using PySpark.
- Developed Stored Procedures, Triggers, and SQL scripts for performing automation tasks
- Conduct data lineage and impact analysis as a part of the change management process.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python. Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
- Design and developed Batch processing and real-time processing solutions using ADF, Databricks clusters and stream Analytics.
- Maintain and provide support for optimal pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Created Linked services to connect the external resources to ADF. Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
- Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
- Implemented Azure, self-hosted integration runtime to access private network data.
- Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
- Perform ongoing monitoring, automation, and refinement of data engineering solutions.
- Working with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
- Ensure the developed solutions are formally documented and signed off by business.
- Worked with team members to resolve any technical issue, Troubleshooting, Project Risk & Issue identification, and management.
- Worked on the cost estimation, billing, and implementation of services on the cloud.
Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, AZURE PowerShell, Data bricks, Python, Azure SQL Server, Azure Data Warehouse.
Confidential
Azure Data Engineer
Responsibilities:
- Design and implement end-to-end data solutions (storage, integration, processing, visualization) in
- Azure and Databricks.
- Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
- Involved in Analysis, Planning and Defining data based on business requirements and provided documentation.
- Develop conceptual solutions & create proof-of-concepts to demonstrate viability of solutions.
- Architect and implement ETL and data movement solutions using Azure Data Factory.
- Migration of on premise SQL Server data to Azure Data Lake Store (ADLS) using Azure Data Factory.
- Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing
- Created Pipelines in Azure Data Factory using Linked Services, Datasets, Pipelines to Extract, Transform and load data from different sources like Azure SQL, Blob storage and Azure SQL Data warehouse.
- Experienced in creating Event-based triggers in Azure Data Factory, and in setting up Self-hosted Integration runtime in Azure Data Factory to connect On-Premise SQL Server.
- Primarily involved in Data Migration using SQL, Azure storage, Azure Data Factory.
- Created data integration and technical solutions for Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing synapse analytics and reports for improving marketing strategies.
- Used various Spark Transformations and Actions for cleansing the input data.
- Implemented Spark using Spark SQL for faster testing and processing of data.
- Experienced in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.
- Experienced in working with spark eco system using Spark SQL queries on different formats like Text file, CSV file.
- Used the Jupyter and IPython notebooks to execute the python modules, which generates the Fact data from database and other storage systems.
- Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming and SQL queries.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Created, provisioned different Databricks clusters, notebooks, jobs and auto-scaling.
- Created pipelines and activities in Azure Data Factory and used them to construct end-to-end data-driven workflows for the data movement and data processing.
- Performed data flow transformation using the data flow activity in Azure Data Factory.
- Experienced in executing an Azure Data Factory pipeline by manually and by using aTrigger.
- Designed and normalized the databases, have written T-SQL Queries and created different objects like Tables, Views, Stored Procedures, User defined functions and Indexes.
- Created Complex ETL Packages to extract data from staging tables to partitioned tables with incremental load.
- Created very complex Packages with considering all the best practice like error handling, logging, configurations, deployments, and maintenance.
Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, AZURE PowerShell, Data bricks, Python, Azure SQL Server, Azure Data Warehouse.
Confidential, IL
Data Engineer
Responsibilities:
- Created Azure Data Storage accounts within the ADF (Azure Data Factory) for storing the big data inside the cloud.
- Configured Azure SQL database with Azure storage Explorer and with SQL server.
- Preparing and uploading the source data via azure storage explorer.
- Designed and implemented end-to-end data solutions (storage, integration, processing, visualization) in Azure
- Create linked services to connect to Azure Storage, and on-premises SQL Server.
- Upgraded project from on premise data to cloud using Azure data lake analytics, Data factory, Databricks, Azure Data Lake Storage.
- Designed and implemented database solutions in Azure SQL Data Warehouse and Azure SQL
- Experienced in managing Azure Data Lake Storage (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
- Experienced in creating a Data Lake Storage Gen2 storage account and a file system.
- Used Copy Activity in Azure Data Factory to copy data among data stores located on-premises and in the cloud.
- Develop Azure SQL Data Warehouse SQL scripts with Polybase support and processing files stored in Azure Data Lake Storage
- Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
- Experience working in reading Continuous JSON data from different source system using Kafka into Databricks
- Delta and processing the files using Apache Structured streaming, PySpark and creating the files in parquet format.
- Analyzed the SQL Scripts and designed the solution to implement using PySpark.
- Developed Stored Procedures, Triggers, and SQL scripts for performing automation tasks
- Performed and fine-tuned Stored Procedures and SQL Queries and User Defined Functions.
- Involved in designing and developing Tabular models.
- Developed Tabular models with row level security and integrated that to Power pivots and Power Views.
Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, AZURE PowerShell, Data bricks, Python, Azure SQL Server, Azure Data Warehouse.
Confidential
ETL Developer
Responsibilities:
- Involved in gathering Business Requirements, Business Analysis, Design and Development, Testing, And Implementation of business rules.
- Involved in the Complete Software Development Lifecycle Experience (SDLC) from Business Analysis to Development, Testing, Deployment, and Documentation.
- Created tables, views in Teradata, according to the requirements.
- Used Teradata utilities FastLoad, MultiLoad, TPT to load data.
- Wrote numerous BTEQ scripts to run complex queries on the Teradata database.
- Created proper Primary Index taking into consideration of both planned access of data and even distribution of data across all the available AMPS. Responsible for Performance tuning at various levels during development.
- Implemented complex data flows through data flow tasks, executed sql tasks, sequence containers, and for each loop containers.
- Assisted in staging Historical and Incremental Loads.
- Worked extensively on Aggregated/Summarized data.
- Performed tuning and optimization of complex SQL queries using Teradata Explain.
- Used Power Centre Workflow Manager to create workflows, sessions, and also used various tasks like command, event wait, event raise, email.
- Worked extensively in the Development of large Projects with complete END to END participation in all areas of the Software Development Life Cycle and maintained documentation.
Environment: Teradata 12, Oracle 9.i/10g, Teradata Visual Explain, BTEQ, Teradata SQL Assistant, Fast Load, Multi Load, Fast Export, UNIX Shell Scripting, ETL