Azure Data Engineer Resume
MI
SUMMARY
- Around 8 years of experience in analyzing, designing, and developing Client/Server, Data Warehousing/ Data Modeling /Business Intelligence (BI) Stack database applications using MS SQL Server 2017/2016/14/12/08/05, SQL Server Integration, and Reporting & Analysis Services (SSIS, SSRS & SSAS).
- Experience in working on Different Azure services like Azure Data Factory (v1 & v2), Azure Data Lake store(Gen1 & Gen2), Azure Data Lake Analytics, Azure Databricks, Event Hubs, Azure storage accounts (Blob Storage), Logic Apps, Batch Account, Azure Active Directory, Azure Key Vault, Azure Automation.
- Strong experience with T - SQL (DDL & DML) in Implementing & Developing Stored Procedures, Triggers, Nested Queries, Joins, Cursors, Views, User Defined Functions, Indexes, User Profiles, Relational Database Models, Creating & Updating tables and checking the database consistency by executing DBCC Commands.
- Ability to work in all stages of System Development Life Cycle (SDLC).
- Worked on writing the u-sql scripts to get the partitioned data from ADLA.
- Worked on setting up Databrick Environment, managing Databricks workspace Folder permissions, configuring clusters, Transforming the data using pyspark, spark sql and delta lake.
- Worked on designing and developing the pyspark templates for the u-sql to pyspark migration.
- Worked on creating the Azure Data Factory pipelines to process the files from SFTP folders and send out the notification emails using the logic app's.
- Experience in optimizing the queries by creating various clustered, non-clustered indexes, indexed views and user-defined functions, Common Table Expressions (CTE'S), User-defined functions and views and used Backup, Recovery models using MS SQL Server 2016/2014/2012/2008.
- Extensive Experience in RDBMS concepts such as Tables, User Defined Data Types, Indexes, Indexed Views, Functions, CTE's, Table Variables and Stored Procedures.
- Hands on experience in created ADF (Azure Data Factory) pipelines for migrating the on-premises data from Netezza to Azure Sql Data Warehouse.
- Developed power shell scripts to deploy datasets, liked services and pipelines in different environments and created a power shell run book for Azure automation account.
- Hands on experience in installing, configuring, managing, upgrading and migrating, Backup/Restore, monitoring and troubleshooting SQL Server 2008R 2/2012/ 2014/2016/2017/2019 database systems.
- Experience in Managing Security of SQL Server Databases by creating Database Users, Roles and assigning proper permissions according to the business requirements.
- Expertise in Performance tuning, Query Optimization and Maintaining data integrity using SQL Profiler and Spotlight.
- Experience in Database administration activities like backup, disaster recovery and maintaining database Security.
- Implemented the new SQL Server 2012 mechanisms like data compressions, online indexing, Contained Databases, security principals and Always on Availability groups.
- Exposure in designing, developing, and delivering business intelligence solutions using Power BI, SQL Server Integration Services (SSIS), Analysis Services (MDS), and Reporting Services (SSRS).
- Exposure in developing different types of reports using Power BI.
- Experience in analyzing, designing, tuning and developing business intelligence database applications using MS SQL Server 2008R 2/2012/2014/2016 SSIS, Reporting and Analysis services.
- Extensive experience in Data Extraction and Transforming and Loading (ETL) using DTS package and by pulling large volumes of data (VLDB) from various data sources in MS SQL Server 2000 and SQL Server Integration Services (SSIS) in MS SQL Server 2008/2005 with .NET, Import/Export data, Bulk insert and BCP.
- Hands on experience with performing various SSIS data transformation tasks like Lookups, Fuzzy Lookups, Conditional Splits and Event Handlers, Error Handlers etc.
- Worked extensively on Extraction, Transformation, loading data from Oracle, DB2, Access, Excel, Flat Files and XML using DTS, SSIS.
- Extensively used Report Wizard, Report Builder and Report Manager for developing and deploying reports in SSRS.
- Experience in developing Dashboard, Ad-hoc and Parameterized Reports using SSRS.
- Experience in Configure and maintain Report Manager and Report Server for SSRS, Deployed and Scheduled the Reports in Report Manager.
- Experience in .NET/C# application development environment to use SQL Server Databases.
- Good experience in creating OLAP cubes using SQL Server analysis services (SSAS).
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling OLAP for dimension modeling.
- Good knowledge of Data Marts, Data warehousing, Operational Data Store (ODS), OLAP, Data Modeling like Dimensional Data Modeling, Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables using MS Analysis Services.
- Hands on Experience in developing ETL DTS Packages & SSIS Packages for integrating data using OLE DB connection from various sources like (Excel, CSV, flat file) by using multiple transformations provided by SSIS and Analysis Services (MDS).
- Experience in handling Master Data Management (MDM) services and setting up the Master Data Services (MDS) to integrate and maintain various entities.
- Good understanding of Normalization /De-normalization, Normal Form and database design methodology Expertise in using data modeling tools like MS Vision and Erwin Tool.
- Hands on experience with Visual Studio Online (VSO), Team Foundation Server (TFS).
- Hands on experience with SDL Onboarding. Creating build and release pipelines using Azure DevOps.
- Skilled in design of logical and physical data modeling using Erwin data modeling tool.
- Strong background in a disciplined software development life cycle (SDLC) process and has excellent analytical, programming and problem-solving skills.
PROFESSIONAL EXPERIENCE
Azure Data Engineer
Confidential - MI
Responsibilities:
- Actively participated in interacting with users, team lead, technical manager to fully understand the Application/System and Business requirements.
- Built new Data factory Pipelines for data ingestion from On Prem FTP servers to Azure data lake store using Azure Data Factory (V2).
- Created metadata driven Data factory Pipelines to pass the parameters to the pipelines dynamically using pipeline parameters.
- Worked on designing the complex Azure data bricks templates, re-useable python functions.
- Designed/Developed Realtime streaming notebook for the data ingestion using structured streaming from even-hubs to delta tables and azure synapse using jdbc push.
- Worked on developing the pyspark scripts, creating clusters, using workspace, running jar files, creating jobs in Azure Databricks.
- Worked on converting the existing u-sql scripts to pyspark
- Involved in transforming the data through Azure Data Bricks notebooks and adding them as the activity for data factory workflows.
- Good amount of working knowledge in parameterizing the pyspark scripts from data factory.
- Worked on transforming the Azure data lake data using pyspark and Spark Sql.
- Worked on developing the real time process using EventHub's and transforming the data using Scala notebooks.
- Used Python re-useable functions as part of coding and register them as a library.
- Good amount of knowledge in managing Azure Databricks workspace creating the Mount points using Python, Cluster Configuration, Permission.
- Good working knowledge of Azure data bricks delta lake.
- Worked on writing the u-sql Scripts for data transformation and extensively used Table valued functions for parameterizing the path inside the scripts.
- Worked on developing and using the ABC Framework for Auditing, Logging and Reporting of all the Pipeline executions.
- Involved in troubleshooting the failed u-sql jobs which has been executed through Azure Data Lake Analytics.
- Used Azure Data Factory for data migration from OnPrem to Azure SQL Datawarehouse.
- Created external tables in Azure Sql data warehouse to read the data from Azure data lake store using PolyBase.
- Involved in Optimizing Stored Procedures and long running queries using indexing strategies and query-optimization techniques.
- Developed a reusable pipeline which can perform sanity checks on source data files and update the logs in sql table.
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views and other T-SQL code and SQL joins for applications following SQL code standards.
- Worked on enabling auditing and monitoring for all the azure services and creating a user groups and sending out an email whenever there is a failure.
- Created external tables in Hive (Ambari) to retrieve the data from Azure data lake store using PolyBase.
- Developed and deployed the code to different environments using Azure DevOps CICD Pipelines.
- Expertise in using Azure DevOps for code check-in, creating pull request and configuring build and release definitions.
- Good amount of knowledge in understanding the Arm templates used for deploying various Azure Resources.
Environment: SQL Server 2014/2016/2017/2019, Azure Sql Data Warehouse, Azure Data Factory v2, Azure Data Lake Store, Azure Data Lake Analytics, Azure Databricks, EventHub, Azure Storage account, Azure Automation, logic apps, power shell, Azure DevOps, Python, Power BI, MS PowerPoint, MS Project, C#, Visual Studio 2017/2015/2012.
Azure Data Engineer
Confidential - Louisville, KY
Responsibilities:
- Understand user requirements and Data Model defined and developed by the business analysts and data architects.
- Worked on Using Azure Data Factory (v1 & v2), Azure storage accounts, Data Lake store, Data Lake Analytics, Logic Apps, Azure Automation Account.
- Used Azure Key vault to store the secrets and configured the ADF pipeline to get the connection string secrets from the Key vault at the run time.
- Worked on writing the u-sql scripts for delta load from ADLA (Azure Data Lake Analytics) Catalog.
- Created end to end workflow using Azure logic apps. Used logic app to send notification email based on the HTTP Post Method.
- Worked on creating the Azure Data Factory pipelines to process the files from SFTP folders and send out the notification emails using the logic app's.
- Worked on using Azure Automation Account to schedule the Power shell (run books).
- Worked on using Used the Azure Batch Account for custom activity runs.
- Developed models in Azure Analysis Service (AAS) and deployed them to various environment using release definition.
- Exposure in developing different types of reports using Power BI.
- Implemented database management techniques that include backup/restore, import/export and generated scripts.
- Involved in Query Optimization, Performance Tuning and Rebuilding the Indexes at regular intervals for better performance.
- Created Database Objects - Tables, Indexes, Views, User defined functions, Cursors, Triggers, Stored Procedure and Constraint by SQL Server.
- Developed complex programs in T-SQL, writing Stored Procedures, Triggers, Functions and Queries with best execution plan.
- Created databases and schema objects including tables, indexes and applied constraints, connected various applications to the database and written functions, stored procedures and triggers.
- Involved in creating multiple parameterized stored procedures which were used by the reports to get the data.
- Managed indexes, statistics and tuned queries by using execution plan for optimizing the performance of the databases.
- Maintained the physical database by monitoring and optimizing performance, data integrity and SQL queries for maximum efficiency using SQL Profiler.
- Exposure in developing Dashboards using Power BI for analyzing MS Product Key Activations, MS Product Key Blocks & MS Product Key distribution shared across different partners.
- Worked on using the ADF for loading the data from different sources to Azure sql database.
- Hands on Experience in deploying the resources into the cloud with ARM Templates.
- Created a Build and Release pipeline using Azure DevOps.
- Configured Fortify and cred scan for visual studio solutions in branch using Build definition.
- Ability to create scripts using Azure PowerShell for automation and build process.
- Good working experience in Azure Logic apps, Azure Functions.
- Experience in managing Azure Storage Accounts.
- Created Entities in MDM and updated the data programmatically. Created ADF Pipelines to Read and write Data from MDM.
- Created power shell scripts to deploy the datasets, pipelines and linked services. Created power shell scripts to clean up the datasets and run the pipelines on demand.
Environment: SQL Server 2008/2012/2014/2016/2017, Azure Data Factory (v1 & v2), Azure Data Lake Store, Azure Data Lake Analytics, Azure Storage account, Azure Automation, logic apps, Azure batch account, power shell, Azure DevOps, Power BI, MDM, MS PowerPoint, MS Project, C#, Visual Studio 2017/2015/2012/2008, VSO (visual Studio Online).
Azure Data Engineer
Confidential
Responsibilities:
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Lead the modern Data Architecture practice, deliver Big Data and Cloud Technologies Projects.
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools.
- Installed and configured Hive and written Hive UDFs and used repository of UDF's for Pig Latin.
- Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Migrated the existing on-perm code to AWS EMR cluster.
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Applying machine learning libraries and algorithms optimize existing data.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using S3.
- Used HBase NoSQL Database for real time and read/write access to huge volumes of data in the use case.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into HBase.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket.
- Created spark jobs to apply data cleansing/data validation rules on new source files in inbound bucket and reject records to reject-data S3 bucket.
- Research and develop state of the art techniques in the field of Machine Learning.
- Created HBase tables to load large sets of semi-structured data coming from various sources.
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Created tables along with sort and distribution keys in AWS Redshift.
- Create PySpark frame to bring data from DB2 to Amazon S3.
- Created shell scripts and python scripts to automate our daily tasks (includes our production tasks as well)
- Created, altered, and deleted topics using Kafka Queues when required with varying.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
- Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
- Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
- Provide guidance to development team working on PySpark as ETL platform.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
- Developed and maintained batch data flow using HiveQL and unix scripting.
- Developed and execute data pipeline testing processes and validate business rules and policies.
- Built code for real time data ingestion using MapR-Streams.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Automation of unit testing using Python. Different testing methodologies like unit testing, Integration testing.
- Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Involved in writing SQL queries to validate the data between source and target systems
- Implemented different data formatter capabilities and publishing to multiple Kafka Topics.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
Environment: Hadoop 3.0, MapReduce, Hive 3.0, Agile, HBase 1.2, PySpark, NoSQL, AWS, Kafka, Pig 0.17, HDFS, Java 8, Hortonworks, Spark, PL/SQL, Python
Data Engineer
Confidential
Responsibilities:
- Making and maintaining weekly TRP data files for all DD Channels.
- Analyzing the causes for the drop of viewership due to time band and market activity from the competition and recommend corrective action to increase viewership
- Data Analysis and Visualization using python Libraries (Pandas, Matplotlib, Scikit-learn, NumPy)
- Identifying priority markets for channels and finding Target audience through marketing Mailer creations, FPC Designing.
- Analyze feedback and provide recommendations on content and program schedule based on BARC viewership data and researching quantitative and qualitative data.
- Created reporting tables for comparing source and target data and report data discrepancies (mismatch, missing scenarios) found in the data.
- Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
- Writing Unix Scripts to automate Data collections and processing.
- Developed reports and dashboards for the CMO & Director of Marketing (using Tableau, Excel and SQL) that measured the effectiveness of inbound marketing campaigns.
- Follow up on a weekly basis with individual DD broadcasting stations to collet TRP data files
- Analyze said TRP Data and provide solutions to the queries for the betterment of each channel.
Environment: SQL, Python (Pandas, Matplotlib, Scikit-learn, NumPy), UNIX Shell Scripting, Tableau, Excel.
