We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Around 7 Years of professional experience in full Software Development Life Cycle (SDLC), Agile Methodology and analysis, design, development, testing, implementation and maintenance in SPARK, Hadoop, Data Warehousing, and Scala.
  • Experience on Migrating SQL database toAzure Data Lake, Azure data lake Analytics,Azure SQL Database, Data Bricks,andAzure SQL Data warehouseand Controlling and Migrating On - premises databases toAzure Data Lake storeusing Azure Data factory.
  • Experience in building near-real-time me pipelines using Kafka and, Pyspark.
  • Have good experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and design tabular models in Azure analysis services.
  • Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.
  • Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analysing& transforming the data to uncover insights into the customer usage patterns.
  • Good understanding ofSpark Architectureincluding Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Good understanding ofBig Data Hadoopand Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, andKafka(distributed stream-processing).
  • Experience in developing Scala applications on Hadoop and Spark SQL for high - volume and real-time data processing.
  • Good experience in creating PowerShell scripts for automating cluster creation and storage accounts in the production environment.
  • Experience in developing spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Experience in Data analysis, Data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming.
  • Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
  • Hands-on experience working with Kusto.
  • Experienced on Microsoft’s internal tools likeCosmos, Kusto, iScopeetc. which are known for doing ETL operations efficiently.
  • Experience in Implementing ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
  • Experience in customizeTFS/GIT and Azure Devops GIT Work items and created different work item queries
  • Strong experience and knowledge in Data Visualization with Power BI creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
  • Experience in designing, developing, scheduling reports/dashboards using Power BI using DAX.
  • Expertise in various phases ofproject life cycles (Design, Analysis, Implementation and testing)
  • Excellent communication skills with excellent work ethics and a proactive team player with a positive attitude.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, Spark, MapReduce, YARN, Spark-Core, Spark-Sql.

Programming Languages: .NET, Java, C#,C/C++, Python, HTML, SQL, PL/SQL, and Scala.

Scripting Languages: Shell Scripting, Bash, PowerShell.

Reporting Tools: SSRS, Power BI reports, Tableau.

Provisioning Tools: Terraform, GITHub

Operating Systems: UNIX, Windows, LINUX

Databases: Oracle, MySQL, MS SQL, NoSQL.

Cloud Technologies: Microsoft Azure - Cloud Services (PaaS & IaaS), Active Directory, Application Insights, Azure Monitoring, Azure Search, Data Factory, Key Vault and SQL Azure, Azure Devops, Azure Analysis services, Azure Synapse Analytics (DW), Azure Data Lake.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Azure Data Engineer

Responsibilities:

  • Worked in Agile development environment and involved in logging defects in Jira and Azure Devops tools.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Data Frames, Azure Databricks.
  • Construction and maintenance of production level data engineering pipelines for optimization of ETL/ELT jobs from sources like Azure SQL, Azure Data Factory, Blob storage, Azure SQL Data warehouse, Azure Data Lake Analytics.
  • Orchestrate data workflows using airflow to manage and schedule by creating DAGS using Python.
  • Working with the Microsoft Azure Cloud Services and Deployed the servers through Azure Resource Manager Templates or Azure Portal.
  • Writing a different APIs to connect with the different Data feeds to get the Data using Azure Web Job and Functions integration with Cosmos DB.
  • Expertise in setup High Availability and Recoverability of databases using with SQL Server technologies including Always on Azure VM
  • Designed and developed a new solution to process the data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
  • Involved in moving the raw data between different systems using Apache Nifi.
  • Import the data from different sources like HDFS/HBase into Spark RDD
  • Responsible for data ingestion into Bigdata using Spark Streaming and Kafka.
  • Worked on transforming the Azure data lake data using pyspark and Spark Sql.
  • Design and implement streaming solutions using Kafka or Azure Stream Analytics
  • Used Spark Data Frames, Spark-SQL Spark for performing various data transformations and dataset building.
  • Created Build and Release (CI/CD) for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).
  • Organize and implement different Load Balancing solutions for PostgreSQL cluster
  • Worked on the PowerShell Script to automate the Windows patching and created the release in Azure DevOps pipelines.
  • Excellent understanding of SCM tools such as Git, Git Hub, BitBucket, Azure Repos Git
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.
  • Installed Kibana using salt scripts and build custom dashboards that can visualize aspects of important data stored by Elastic search.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Performed data analysis and predictive data modeling.
  • Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database.
  • Worked in creating HDInsight cluster and Storage Account with End-to-End environment for running the jobs.
  • Developed and maintained multiple Power BI dashboards/reports and content packs.
  • Enhancing the architectural Design of the existing DWH for better performance and analysis.

Environment: Azure, Azure Dala Lake, Azure Data Bricks, HDinsight, Hadoop, Spark, Spark SQL, Scala, ELK, Python, ETL, Elastic Search, Agile

Confidential, Dallas, TX

Azure BigData Engineer

Responsibilities:

  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure Synapse, Azure SQL Data warehouse, write-back tool and backwards.
  • Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks
  • Developed Spark applications usingScalaandSpark-SQLin Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Played major role in Migration of DAGs from legacy Airflow to Managed Airflow platform
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure Data Lake Analytics, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Developed an automated process in Azure cloud which can ingest data daily from web service and load in to Azure SQL DB.
  • Using apache NiFi to automate the data movement between Hadoop systems
  • Configured Snowflake Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Developed Python scripts to do file validations in Databricks and automated the process using ADF.
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase.
  • Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data for efficiency and open table counts for data coming in from IOT enabled poker and other pit tables.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Hadoop cluster.
  • UsedZeppelin,Jupyternotebooks andSpark-Shellto develop, test and analyze Spark jobs before Scheduling Customized Spark jobs.
  • Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement.
  • To meet specific business requirements wroteUDF’sin Scala and Store procedures.
  • Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
  • Conducting code reviews for team members to ensure proper test coverage and consistent code standards.
  • Responsible for documenting the process and cleanup of unwanted data.
  • Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
  • Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Developing PowerShell Scripts for automation purpose.
  • Build near-real-time pipelines using Kafka, Pyspark.
  • Used Bitbucket and Git repositories.
  • Involved in using snowflake to create and Maintain Tables and views.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).
  • Involved in using ScalaTest Funsuite Framework for developing Unit Tests cases and Integration testing.
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
  • Migrate data from traditional database systems to Azure databases
  • Involved in running the Cosmos Scripts in Visual Studio for checking the diagnostics.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.

Environment: Hadoop, MapReduce, Azure, Python, HDFS, Pig, Hive, Spark, Kafka, IntelliJ, Cosmos, Sbt, Zeppelin, YARN, Scala, Tableau, SQL, Git.

Confidential, Dallas, TX

Azure Data Engineer

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Synapse, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the Unix operating system as well.
  • Worked on developing the real time process using EventHub's and transforming the data using Scala notebooks.
  • Written Sqoop Queries to import data into Hadoop from SQL Server table.
  • Developed Spark streaming application to pull data from cloud to hive table.
  • Developed scripts in Hive to perform transformations on the data and load to target systems for reporting.
  • Worked on using Azure Automation Account to schedule the Power shell (run books).
  • Prepared capacity and architecture plan to create the Azure Cloud environment to host migrated IaaS VMs and PaaS role instances for refactored applications and databases
  • Designed and built Data Quality frameworks for covering Data Quality aspects like Completeness, Accuracy, coverage using Kafka.
  • Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data brick cluster.
  • Built Elastic search cluster in integration with Kibana for publishing real time dashboards for maintenance data.
  • Installed and configured big data tools Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Involved in loading and transforming large sets of structured data from router location to EDW using an Apache NiFi data pipeline flow
  • As a Big Data implementation engineer responsible for developing, troubleshooting and implementing programs.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, ETL Processes for data warehouses.
  • Knowledge on creating various repositories and version control using GIT.
  • Involved in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Involved in creating Docker containers leveraging existing Linux Containers and AMI's in addition to creating Docker containers from scratch.
  • To meet specific business requirements wrote UDF’s inScalaandPyspark.
  • Developed Tableau workbooks from multiple data sources using Data Blending - Developed Pareto charts, stacked bar graphs, Histograms and Scatter plot.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Environment: Hadoop, Azure, Python, Pyspark, Spark, ADF, Scala, SQL, Tableau.

Confidential

Azure BigData Engineer

Responsibilities:

  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Worked on migration of data from On-prem SQL server to Cloud databases(Azure Synapse Analytics (DW) & Azure SQL DB).
  • Followed agile methodology for the entire project.
  • Invovled in setting up separate application and reporting data tiers across servers using Geo replication functionality.
  • Implemented Disaster Recovery and Failover servers in Cloud by replicating data across regions.
  • Have extensive experience in creating pipeline jobs, scheduling triggers, Mapping data flows using Azure Data Factory(V2) and using Key Vaults to store credentials.
  • To meet specific business requirements wrote UDF’s in Scala and Pyspark.
  • Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Data Lake Analytics, HDInsights, Hive, Sqoop.
  • Involved in developing the Spark Streaming jobs by writing RDD's and developing data frame using Spark SQL as needed.
  • For Log analytics and for better query response used Kusto Explorer and created alerts using Kusto query language.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily data.
  • Involved inAzureSite Recovery andAzureBackup and ConfiguringAzureBackup vault for protecting requiredVMs to take the VM level backups forAzureand On Premises Environment.
  • Worked on creating tabular models onAzure analysis servicesfor meeting business reporting requirements.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • PerformedETLusingAzure Databricks.
  • Working with Azure BLOB andData lakestorage and loading data intoAzure SQL Synapse analytics (DW).
  • Involved in loading and transforming large sets of structured and semi structured from multiple data source to Raw Data Zone (HDFS) using Sqoop imports and Spark jobs.
  • Implemented ETL frame work using Spark with Python and loaded standardize data into Hive and Hbase tables.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving multiple tables from different databases.
  • Developed business intelligence solutions using SQL server data tools and load data to SQL & Azure Cloud databases.
  • Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process.
  • Perform validation and verify software at all testing phases which includes Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases.
  • Worked on Tableau to build customized interactive reports, worksheets and dashboards.
  • Have good experience in logging defects in Jira and Azure Devops tools.
  • Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central Datamart changes.
  • Automated tasks using PowerShell.

Environment: Microsoft SQL Server 2012, Azure Synapse Analytics, Azure Data Lake & BLOB, Azure SQL, Azure data factory, Azure analysis services, BIDS.

Confidential

Data Engineer

Responsibilities:

  • Worked with Project Manager, Business Leaders and Technical teams to finalize requirements and create solution design &architecture.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Spark, Scala, Sqoop
  • Followed Agile methodology including, test-driven and pair-programming concept.
  • Design and Develop Spark code using Scala for high speed data processing to meet critical business requirement.
  • Implement RDD/Datasets/DataFrame transformations in Scala through SparkContext and HiveContext
  • Implemented Oozie workflow engine to run multiple Hive and Python jobs.
  • Developed algorithms & scripts in Hadoop to import data from source system and persist in HDFS (Hadoop Distributed File System) for staging purposes
  • Developed Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts.
  • Replaced the existingMapReduceprograms andHiveQueries into Spark application using Scala.
  • WrotePythonscripts toparse XMLdocuments and load the data in database.
  • Worked on all four stages - data ingest, data transform, data tabulate and data export.
  • Maintained fully automated CI/CD pipelines for code deployment (Gitlab/ Jenkins Deploy)
  • Built code using Java, Spring boot, Maven, and Jenkins for building and automating our data workflow.
  • Used Talend for Big data Integration using Spark and Hadoop.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Performed Junit Tests and Functional tests for validating our code.
  • Developed and published reports and dashboards using Power BI and written effective DAX formulas and expressions.
  • Wrote Puppet manifests and modules to deploy, configure, and manage servers for internal DevOps process.

Environment: Cloudera Hadoop, HDFS, Yarn, Java, Maven, Jenkins, Gitlab, Git, Hive, PySpark, Spark SQL, Sqoop, MS SQL Server, Oracle, SQL/ NoSQL, Linux, Puppet, Tableau.

We'd love your feedback!