We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • Over 7 years of Extensive professional experience in full Software Development Life Cycle (SDLC), Agile Methodology and analysis, design, development, testing, implementation and maintenance in Azure, AzureDatabricks (SPARK), Hadoop, DataWarehousing, Linux and Scala/Python
  • Experience in Bigdata related infrastructure like Hadoop frameworks, Spark Ecosystem, HDFS, Map Reduce, Hive, Storm, Kafka, YARN, HBase, Oozie, Zookeeper, Flume and Sqoop
  • Strong experience on Hadoop distributions like Cloudera and Hortonworks
  • Experience in Developing Spark jobs using Scala for faster real - time analytics and used Spark SQL for querying
  • Hands on experience wifAzure CloudServices like Blob Storage,
  • Migrating teh datafrom Oracle, MS SQL Server into HDFS using Sqoop and importing various formats of flat files into HDFS
  • Experience in importing and exporting data into HDFS and Hive using Sqoop
  • Strong knowledge of core Spark components including - RDDs, Data frame and Dataset APIs, Data Streaming, in memory capabilities, DAG scheduling, data partitioning and tuning
  • Performed various optimizations like using distributed cache for small datasets, partition and bucketing in hive
  • Expertise in Developing Spark application using PySpark and Spark Streaming API's in Python, deploying in yarn cluster in client, cluster mode. UsedSpark Dataframe APIs to ingest data from HDFS to AWS S3
  • Involved in converting HBase/Hive/SQL queries into Spark transformations using RDDs, and Python, Scala.
  • Experience in usingKustoqueries.

TECHNICAL SKILLS

Cloud Technologies: Azure Ecosystem

Big data Tools: HDFS, MapReduce, YARN, Hive, Spark, Sqoop, Kafka, Flume, Hue, Impala.

Databases: Oracle, Teradata,Hadoop, SQL Server, DB2,Sybase, MS Access.

NoSQL Databases: Cosmos DB, HBase, Cassandra.

Hadoop Distributions: Cloudera, Horton Works, Amazon EMR

Data modeling Tools: Dimensional Data Modeling (Star Schema, Snow-Flake Schema), ER Studio.

ETL tools: SQL Loader, Informatica Power Center 8.1, 7.1, 6.2, Data Stage, Microsoft SQL Server, Integrated Services (SSIS), SQL Server Reporting Services (SSRS).

Programming Languages: Scala, Python, SQL, HiveQL, PL/SQL, HTML, XML JavaScript.

Version Control Tools: GitHub

Methodologies: AGILE, Scrum

UI Tools: Jupyter Notebook, Zeppelin

BI Tools: QlikView, Tableau, QlikSense, SQL Server

Operating Systems: Windows, Linux, Mac, Unix.

PROFESSIONAL EXPERIENCE

Azure Data Engineer

Confidential, Minneapolis, MN

Responsibilities:

  • Design and development of IT solutions using Bigdata tools
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load datafrom different sources like AzureSQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
  • Responsible for estimating teh cluster size, monitoring and troubleshooting of teh Hadoop cluster.
  • Used Zeppelin, Jupiter notebooks and Spark-Shell to develop, test and analyze Spark jobs before Scheduling Customized Spark jobs.
  • For Log analytics and for better query response usedKusto Explorer.
  • Undertake data analysis and collaborated wif down-stream, analytics team to shape teh data according to their requirement. Worked on Microsoft’s internal tools likeCosmos, Kusto, iScopeetc. which are known for doing ETL operations efficiently.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • To meet specific business requirements wrote UDF’s in Scala and Store procedures
  • Replaced teh existing MapReduce programs and Hive Queries into Spark application using Scala
  • Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS)
  • Conducting code reviews for team members to ensure proper test coverage and consistent code standards.
  • Responsible for documenting teh process and cleanup of unwanted data
  • Responsible for Ingestion of datafrom Blob to Kusto and maintaining teh PPE and PROD pipelines.
  • LINQ was extensively used in view extension to parse out unnecessary information and to make call to SQL database.
  • Expertise in creating HDInsight cluster and Storage Account wif End-to-End environment for running teh jobs
  • Developed Json Scripts for deploying teh Pipeline in azure dataFactory (ADF) that process teh data using teh Cosmos Activity
  • Responsible for monitoring, analyzing and enhanced existing databases. Prepared DML SQL game plan for bug fixing.
  • Worked wif Sql Server Management Studio for modify existing database and inserting new data.
  • Hands-on experience on developing PowerShell Scripts for automation purpose
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS)
  • Experience in using Scala Test Fun Suite Framework for developing Unit Tests cases and Integration testing.
  • Hands on experience in working on Spark SQL queries, data frames, and import Data from Data sources, perform transformations; perform read/write operations, save teh results to output directory into HDFS.
  • Involved in running teh Cosmos Scripts in Visual Studio 2017/2015 for checking teh diagnostics
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks

Environment: Azure dataServices, SQL Server, Databricks, Blob Storage, ADF, Azure SQL Server, HDFS, Pig, Hive, Spark, Kafka, IntelliJ, Cosmos, Sbt, Zeppelin, YARN, Scala, SQL, Git

Data engineer

Confidential, Indianapolis, IN

Responsibilities:

  • Developed SSIS ETL package to load vehicle transactions captured through Java webservice.
  • Designed and developed SSAS multi-dimensional cube to store historical data and process cube partition through SSIS.
  • Created cube partition dynamically using XMLA and SSIS.
  • Process and refresh cube automatically through schedule jobs.
  • Design logical and physical data model and created database using 3rd Normal form in OLTP systems.
  • Enabled Change Data Capture CDC in SQL Server to sync data between legacy and modern systems using Informatica.
  • Monitor NAS file share periodically using SQL Master job and trigger dependency job based on file feed from various sources.
  • Developed T-SQL Queries, Packages, Stored Procedures, Functions, Trigger using SQL.

Environment: SSIS, SQL, SSAS, MDX, Informatica, SSRS, Power shell, C# .Net, JavaScript.

Bigdata Developer

Confidential - Mclean, VA

Responsibilities:

  • Implemented a Data Pipeline Using Spark/Scala
  • Wrote AWS Lambda functions to run teh data pipeline in teh cloud
  • Moving Data from HDFS to Amazon S3 and data lake.
  • Developed Simple to complex Spark Jobs using Spark (scala) and implemented EMR steps to launch teh EMR cluster and kick teh job off.
  • Building teh CI/CD process from scratch.
  • Built data pipeline frameworks to automate high-volume and real-time data delivery for our Hadoop and streaming data hub.
  • Worked on teh Teradata stored procedures and functions to confirm teh data and have load it on teh table. Created Containers in Docker .
  • Develop automation solutions leveraging shell, Perl, and Java Scripts to increase teh operational efficiency
  • Implemented Data Management strategy defining direction of data organization; metadata management wifin Data Lake
  • Built data APIs and data delivery services that support critical operational and analytical applications for our internal business operations, customers and partners.
  • Responsible to designing and deploying new ELK clusters (Elastic search, Logstash, Kibana, beats, Kafka, zookeeper etc.
  • Expertise in machine learning, graph analytics and text mining techniques, such as classification, regression, clustering, feature engineering, label propagation, Page Rank, information extraction, topic modeling etc.
  • Worked on optimizing and tuning teh Teradata views and SQL’s to improve teh performance of batch and response time of data for users.
  • Design, develop and evaluate innovative predictive models
  • Transformed complex analytical models into scalable, production-ready solutions
  • Manage AWS EC2 instances utilizing Auto Scaling, Elastic Load Balancing and Glacier for our QA and UAT environments as well as infrastructure servers for GIT and Chef.
  • Skilled in monitoring servers using Nagios, Data dog, Cloud watch and using EFK Stack Elastic search Fluent Kibana.
  • Involved in establishing automated Hadoop Integration testing system and implementing Oozie workflow.
  • Skilled in using Zookeeper and Apache Oozie Workflow Engine for coordinating teh cluster, automation and scheduling workflow job
  • Developed applications from ground up using a modern technology stack such as Scala, Spark, NoSQL for data migration and analytics.
  • Worked directly wif Product Owners and customers to deliver data products in a collaborative and agile environment.
  • Written and Maintained Automated Salt scripts for Elastic search, Logstash, Kibana, and Beats.
  • Create and run example MapReduce jobs, and perform advanced Hive and Impala queries.
  • Leveraging reusable code modules to solve problems across teh team and organization.
  • Implemented Spring boot Microservices to process teh messages into teh Kafka cluster setup.
  • Had knowledge on Kibana and Elastic search to identify teh Kafka message failure scenarios.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup wif help of Zookeeper.
  • Built robust systems wif an eye on teh long-term maintenance and support of teh application.
  • Responsible for installation & configuration of Jenkins to support various Java builds and Jenkins plugins to automate continuous builds and publishing Docker images to teh Nexus repository.
  • Used CI/CD tools Jenkins, Git/GitLab’s, Jira and Docker registry/daemon for configuration management and automation using Ansible.
  • Created Containers in Docker
  • Worked on teh Teradata stored procedures and functions to confirm teh data and have load it on teh table.
  • Moved all log/text files generated by various products into S3 locations for further analysis.

Environment: Snowflake, Elastic search, Machine Learning, Teradata, Pig, AWS, Spark, Scala, Datalake, Impala, Jenkins.

We'd love your feedback!