Azure Data Engineer Resume
Cincinnati, OhiO
SUMMARY
- 8+ years of work experience in Development and Implementations of Data Warehousing solutions.
- Experienced in Azure Data Factory and preparing CI/CD scripts, Devops for the deployment.
- 4+ years of development experience oncloud platforms(Azure).
- Solid experience on building ETL ingestion flows using Azure Data Factory.
- Experience in building Azure stream Analytics ingestion spec for data ingestion which helps users to get sub second results in Realtime.
- Experience in building ETL(Azure Data Bricks) data pipelines leveraging PySpark, Spark SQL.
- Extensively worked on Azure Databricks
- Experience in building the Orchestration on Azure Data Factory for scheduling purposes.
- Experience working with Azure Logic APP Integration tool.
- Experience working with Data warehouse like Teradata, Oracle,SAP,HANA.
- Experience on Implementation of Azure log analytics providing Platform as a service for SD - WAN firewall logs.
- Experience in building the data pipeline by leveraging the Azure Data Factory.
- Selecting appropriate low cost driven AWS/Azure services to design and deploy an application based on given requirements.
- Expertise on working with databases like Azure SQL DB,Azure SQL DW
- Good Knowledge on Azure Keyvalut.
- Good Knowledge on Azure Devops
- Solid programming experienceon working with Python, Scala.
- Experience working in a cross-functional AGILE Scrum team.
- Happy to work with the team who are in middle of the road with some Big Data challenges for both on prem and cloud.
- Hands-on experience in Azure Analytics Services - Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB) etc.
- Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.
- Have knowledge on Basic Admin activities related to ADF like providing access to ADLs using service principle, install IR, created services like Azure Data Lake Storage, logic apps etc.
- POC using AZCOPY for copying files to Azure storage (blob, file share).
- POC using ADLCOPY to copy blob to ADLS and ADLS to ADLS.
- Good knowledge on polybase external tables in SQL DW.
- Involved in production support activities
- Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames inScala.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Good working knowledge of Amazon Web Services(AWS) Cloud Platform which includes services likeEC2,S3,VPC,ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy,DynamoDB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Expertise working with AWS cloud services like EMR, S3,Redshift, EMR cloud watch, for big data development.
- Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
- Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
- Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
- Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.
TECHNICAL SKILLS
Operating system: Unix and Windows
Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark 2.3, Airflow 1.10.8, Nifi 2, HBase 1.2, Hive 2.3, Pig 0.17 Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0
Database Tools: Nosql, MangoDB,Teradata,Oracle,SQL Server, AZURE DW
Methodologies: System Development Life Cycle (SDLC), Agile
Scripting Languages: Python, Scala,shell
Azure Cloud Stack: Azure Data Factory, Azure data bricks,Gen2 storage, Blob Storage, Event Hub, Log analytics, Sentinel analysis, Cosmos DB,ADLA,ADLS.
AWS Cloud stack: EC2,, S3, Redshift, EMR, Lambda, Athena,Glue
Languages: Scala,R, Python,C,C++,Java
PROFESSIONAL EXPERIENCE
Confidential, Cincinnati, Ohio
AZURE DATA ENGINEER
RESPONSIBILITIES:
- Created Linked Services for multiple source system (i.e.: Azure SQL Server, ADLS, BLOB, Rest API).
- Created Pipeline’s to extract data from on premises source systems to azure cloud data lake storage; Extensively worked on copy activities and implemented the copy behavior’s such as flatten hierarchy, preserve hierarchy and Merge hierarchy; Implemented Error Handling concept through copy activity.
- Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait.
- Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity; create dynamic pipeline to handle multiple source extracting to multiple targets; extensively used azure key vaults to configure the connections in linked services.
- Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines; monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines.
- Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1, SCD-2 approaches.
- Created Azure Stream Analytics Jobs to replication the real time data to load to Azure SQL Data warehouse;
- Implemented delta logic extractions for various sources with the help of control table; implemented the Data Frameworks to handle the deadlocks, recovery, logging the data of pipelines.
- Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports to data loads for testing; Implemented reusable components to reduce manual interventions
- Created Snowpipe for continuous data load. Load the data from Azure blob storage
- Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS
- Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.
- Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
- Created Triggers, PowerShell scripts and the parameter JSON files for the deployments
- Worked with VSTS for the CI/CD Implementation
- Reviewing individual work on ingesting data into azure data lake and provide feedbacks based on reference architecture, naming conventions, guidelines and best practices
- Implemented End-End logging frameworks for Data factory pipelines.
Confidential, Fremont, CA
DATA ENGINEER
RESPONSIBILITIES:
- Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend.
- Experience in developing scalable & secure data pipelines for large datasets.
- Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and metadata enrichment.
- Supported data quality management by implementing proper data quality checks in data pipelines.
- Delivered data engineer services like data exploration, ad-hoc ingestions, subject-matter-expertise to Data scientists in using big data technologies.
- Build machine learning models to showcase Big data capabilities using Pyspark and MLlib.
- Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
- Implemented data streaming capability using Kafka and Talend for multiple data sources.
- Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu).
- S3 - Data Lake Management. Responsible for maintaining and handling data inbound and outbound requests through big data platform.
- Working knowledge of cluster security components like Kerberos, Sentry, SSL/TLS etc.
- Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility.
- Knowledge on implementing the JILs to automate the jobs in production cluster.
- Troubleshooted user's analyses bugs ( JIRA and IRIS Ticket).
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
- Worked on analyzing and resolving the production job failures in several scenarios.
- Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
- Knowledge on implementing the JILs to automate the jobs in production cluster.
Confidential, St Louis, Missouri
BIG DATA ENGINEER
RESPONSIBILITIES:
- Using Sqoop to import and export data from Oracle and PostgreSql into HDFS so as to use it for the analysis
- Migrated Existing MapReduce programs to Spark Models using Python.
- Migrating the data from Data Lake (hive) into S3 Bucket.
- Done data validation between data present in data lake and S3 bucket.
- Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
- Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
- Used Kafka for real time data ingestion.
- Created different topic for reading the data in Kafka
- Read data from different topics in Kafka.
- Moved data from s3 bucket to snowflake data warehouse for generating the reports.
- Written Hive queries for data analysis to meet the business requirements
- Migrated an existing on-premises application to AWS.
- Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark Sql.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning and bucketing in hive, doing map side joins etc.
- Good knowledge on Spark platform parameters like memory, cores and executors
- By using Zookeeper implementation in the cluster, provided concurrent access for hive tables with shared and exclusive locking
Confidential
SQL Developer
RESPONSIBILITIES:
- Database development experience with Microsoft SQL Server in OLTP/OLAP environments using integration services (SSIS) for
- ETL (Extraction, Transformation and Loading).
- Created ETL Projects using SSIS 2012 in Package Deployment Model as well as Project Deployment Model, also efficiently used
- SSIS catalog for ETL Monitoring.
- Developed SSIS packages that fetches files from FTP, did transformation on those data base on business need before I loaded to destination.
- Creating Metadata tables to log the activity of the packages, errors and change of the variable. Various techniques such as
- CDC, SCD and Hash bytes were used to capture the change of the data and execute incremental loading of the dimension tables
- Responsible for Deploying, Scheduling Jobs, Alerting and Maintaining SSIS packages.
- Implementing and managing Event Handlers, Package Configurations, Logging, System and User-defined Variables, Check
- Points and Expressions for SSIS Packages.
- Automating process by creating jobs and error reporting using Alerts, SQL Mail Agent, FTP and SMTP.
- Developed, tested, and deployed all the SSIS packages using project deployment model in the 2016 environment by configuring the DEV, TEST and PROD Environments.
- Created SQL Server Agent Jobs for all the migrated packages in SQL Server 2016 to run as they were running in the 2014 version.
- Created shared dimension tables, measures, hierarchies, levels, cubes and aggregations on MS OLAP/ Analysis Server (SSAS).
- Created a cube using multiple dimensions and modified the Relationship between a Measure Group and a Dimension, created calculated members and KPI's, using SSAS.
- Created aggregations, partitions, KPI's and perspectives for a cube as per business requirements.
- Involved in developing ASP.NET Web API Application that required integrating the Applications with Service Now using SNOW
- API for creating Incidents ad deploying in Azure.
- Worked on hosting new application databases on Microsoft Azure ensuring the DR setup is in working condition.
- Involved in creating a virtual machine on Azure, installed SQL 2014, created and administered databases then loaded data for mobile application purposes using SSIS from another virtual machine.
- Designed SSAS cube to hold summary data for Target dashboards. Developed Cube with Star Schema.
- Explore data in a Variety of ways and across multiple visualizations using SSRS.
- Responsible for creating SQL datasets for SSRS and Ad-hoc Reports.
- Expert on creating multiple kinds of SSRS Reports and Dashboards.
- Created, Maintained & scheduled various reports.
- Experienced in creating multiple kinds of reports to present story Points.
- Experience in writing reports based on the statistical analysis of the data from various time frame and divisions.