We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

Virginia Beach, VirginiA

SUMMARY

  • Having around 8 years of total IT experience with over 5 years’ experience in Big Data Hadoop experience in Development and Design of Java based enterprise applications.
  • Extensive working experience on Hadoop eco - system components like Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Spark, Kafka, Oozie and Zookeeper.
  • Implemented performance tuning techniques for Spark-SQLqueries.
  • Strong knowledge on Hadoop HDFS architecture, Map-Reduce (MRv1) and YARN (MRv2) framework.
  • Strong hands on Experience in publishing the messages to various Kafka topics using Apache NIFI and consuming the message to HBase using Spark and Python.
  • Experience in Developing Spark applications using Spark - SQL in Data bricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.
  • Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.
  • Worked on creating Spark jobs that process the true source files and successful in performing various transformations on the source data using Spark Data frame, Spark SQL API's.
  • Developed Sqoop scripts to migrate data from Teradata, Oracle to Big data Environment.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
  • Implemented real time data streaming pipeline usingAWS Kinesis, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets.
  • Work on large scale data transfer across different Hadoop clusters, implement new technology stacks on Hadoop clusters using Apache Spark.
  • Added support for AWSS3 and RDS to host static/media files and the database into Amazon Cloud.
  • Experience in project deployment using Heroku/Jenkins and using web services like Amazon Web Services (AWS) EC2, AWS S3, Auto scaling, Cloud Watch and SNS.
  • Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
  • Hands on experience in analyzing log files for Hadoop and eco-system services and finding root cause.
  • Hands on experience on handling different file formats like AVRO, PARQUET, Sequential files, MAP Files, CSV, xml, log ORC and RC.
  • Experience with NoSQL Database HBase, Cassandra, MongoDB.
  • Experience with AIX/Linux RHEL, UNIX Shell Scripting and SQL Server 2008.
  • Worked on data search tool Elastic Search and data collection tool Logstash.
  • Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.
  • Experience in developing stored procedures, triggers using SQL, PL/SQL in relational databases such as MS SQL Server 2005/2008.
  • Exposed into methodologies Scrum, Agile and Waterfall.

TECHNICAL SKILLS

Programming Languages: Java, Python, SQL, and C/C++

Big Data Ecosystem: Hadoop, Map Reduce, Kafka, Spark, Pig, Hive, YARN, Flume, Sqoop, Oozie, Zookeeper, Talend.

Hadoop Distributions: Cloudera Enterprise, Horton Works, EMC Pivotal.

Databases: Oracle, SQL Server, PostgreSQL.

Web Technologies: HTML, XML, JQuery, Ajax, CSS, JavaScript, JSON.

Streaming Tools: Kafka

Testing: Hadoop Testing, Hive Testing, MRUnit.

Operating Systems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.

Cloud: AWS EMR, Glue, RDS, Cloud Watch, S3, Redshift Cluster, Kinesis, Dynamo DB.

Technologies and Tools: Servlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub, Bamboo.

PROFESSIONAL EXPERIENCE

Confidential, Virginia Beach, Virginia

Azure Data Engineer

Responsibilities:

  • Build Data Pipleline Architecture on Azure Cloud platform using NiFi, Azure DataLake Storage Service, Azure HD Insight, Airflow and Data Engineer tool.
  • Designed and developed scalable and cost-effective architecture in Azure Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.
  • Engage with business users to gather requirements, design visualizations and provide training to use self-service BI tools.
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
  • Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
  • Develop conceptual solutions & create proof-of-concepts to demonstrate viability of solutions.
  • Technically guide projects through to completion within target timeframes.
  • Collaborate with application architects and DevOps.
  • Identify and implement best practices, tools and standards.
  • Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.
  • Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and Analytics.
  • Design and Implement an internal process improvement:An automating manual process, an optimizing data delivery, re-designing infrastructure for greater scalability, An Optimize performance tuning.
  • Implementing the Data Quality and content Validation by using tools like Spark,Scala,Hive,Nifi.
  • Involved in creating End-to-End data pipeline within distributed environment using the Big data tools, Sparkframework and Power BI for data visualization.

Confidential, Valley Forge, PA

Big Data Developer

Responsibilities:

  • Implementing the Proof of Concept (POC) for ETL Abinitio graph concepts which need to be migrated into Spark using scala and python (Pyspark).
  • Develop Data pipelines using Sqoop, Spark, and Map reduce and Hive to Ingest, transform and analyze customer behavior data.
  • Developed a data pipeline using Kafka, Spark Streaming and Hive to ingest the data from data lakes to Hadoop distributed file system.
  • ImplementedSparkusing python andSparkSQL for faster processing of data and algorithms for real time analysis inSpark.
  • UsedSparkfor interactive queries, processing of streaming data and integration with popular NOSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs and Python.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Extracting the Files from the RDBMS (DB2) by using Sqoop into Hadoop file system (HDFS) to process the workflow.
  • Implementing the Partitioning and bucketing for faster query processing in Hive Query Language (HQL).
  • Involving in Converting the HIVE/SQL queries into Spark transformation using Spark Data frames, Datasets and User defined functions (UDF's).
  • Design Hive queries and Pig Script to perform Data Analysis, Data transfer and Table design.
  • Evaluating the Data between the ETL and Hadoop to ensure Data quality.
  • Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems.
  • Testing on Apache Tez Framework and Hadoop Map Reduce frameworks for building high performance batch and interactive Data Processing Applications.
  • Reconciling the data on daily basis in between the ETL and Hive tables by using a compare tool which is implemented in spark framework using Pyspark.
  • Fine tune Hadoop applications for high performance and throughput, troubleshoot and debug any Hadoop ecosystem run time issues.
  • Performing Data validation operation between ETL and Apache Hive tables.
  • Developing the Linux shell scripting for Deploying and running the migrated Hadoop Application in Production Servers.
  • Developing Workflows for scheduling and orchestrating the Hadoop Process.

Confidential

SQL Developer

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Extensively usedSparkstack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.
  • Developed Real-time data processing applications by using Scala and Python and implemented ApacheSparkStreaming from various streaming sources like Kafka, Flume and JMS.
  • Replaced the existing Map Reduce programs intoSparkapplication using Scala.
  • Built on premise data pipelines using Kafka andSparkstreaming using the feed from API streaming Gateway REST service.
  • Developed the Hive UDF's to handle data quality and create filtered datasets for further processing
  • Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
  • Good knowledge on Kafka streams API for data transformation.
  • Developed oozie workflow for scheduling & orchestrating the ETL process.
  • Used Talend tool to create workflows for processing data from multiple source systems.
  • Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
  • Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
  • Deployed applications using Jenkins framework integrating Git- version control with it.
  • Participated in production support on a regular basis to support the Analytics platform
  • Used Rally for task/bug tracking.
  • Used GIT for version control.

We'd love your feedback!