Azure Data Engineer Resume Virginia Beach, Virginia - Hire IT People

SUMMARY

Having around 8 years of total IT experience with over 5 years’ experience in Big Data Hadoop experience in Development and Design of Java based enterprise applications.
Extensive working experience on Hadoop eco - system components like Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Spark, Kafka, Oozie and Zookeeper.
Implemented performance tuning techniques for Spark-SQLqueries.
Strong knowledge on Hadoop HDFS architecture, Map-Reduce (MRv1) and YARN (MRv2) framework.
Strong hands on Experience in publishing the messages to various Kafka topics using Apache NIFI and consuming the message to HBase using Spark and Python.
Experience in Developing Spark applications using Spark - SQL in Data bricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.
Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.
Worked on creating Spark jobs that process the true source files and successful in performing various transformations on the source data using Spark Data frame, Spark SQL API's.
Developed Sqoop scripts to migrate data from Teradata, Oracle to Big data Environment.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
Implemented real time data streaming pipeline usingAWS Kinesis, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets.
Work on large scale data transfer across different Hadoop clusters, implement new technology stacks on Hadoop clusters using Apache Spark.
Added support for AWSS3 and RDS to host static/media files and the database into Amazon Cloud.
Experience in project deployment using Heroku/Jenkins and using web services like Amazon Web Services (AWS) EC2, AWS S3, Auto scaling, Cloud Watch and SNS.
Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Hands on experience in analyzing log files for Hadoop and eco-system services and finding root cause.
Hands on experience on handling different file formats like AVRO, PARQUET, Sequential files, MAP Files, CSV, xml, log ORC and RC.
Experience with NoSQL Database HBase, Cassandra, MongoDB.
Experience with AIX/Linux RHEL, UNIX Shell Scripting and SQL Server 2008.
Worked on data search tool Elastic Search and data collection tool Logstash.
Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.
Experience in developing stored procedures, triggers using SQL, PL/SQL in relational databases such as MS SQL Server 2005/2008.
Exposed into methodologies Scrum, Agile and Waterfall.

TECHNICAL SKILLS

Programming Languages: Java, Python, SQL, and C/C++

Big Data Ecosystem: Hadoop, Map Reduce, Kafka, Spark, Pig, Hive, YARN, Flume, Sqoop, Oozie, Zookeeper, Talend.

Hadoop Distributions: Cloudera Enterprise, Horton Works, EMC Pivotal.

Databases: Oracle, SQL Server, PostgreSQL.

Web Technologies: HTML, XML, JQuery, Ajax, CSS, JavaScript, JSON.

Streaming Tools: Kafka

Testing: Hadoop Testing, Hive Testing, MRUnit.

Operating Systems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.

Cloud: AWS EMR, Glue, RDS, Cloud Watch, S3, Redshift Cluster, Kinesis, Dynamo DB.

Technologies and Tools: Servlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub, Bamboo.

PROFESSIONAL EXPERIENCE

Confidential, Virginia Beach, Virginia

Azure Data Engineer

Responsibilities:

Build Data Pipleline Architecture on Azure Cloud platform using NiFi, Azure DataLake Storage Service, Azure HD Insight, Airflow and Data Engineer tool.
Designed and developed scalable and cost-effective architecture in Azure Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.
Engage with business users to gather requirements, design visualizations and provide training to use self-service BI tools.
Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
Develop conceptual solutions & create proof-of-concepts to demonstrate viability of solutions.
Technically guide projects through to completion within target timeframes.
Collaborate with application architects and DevOps.
Identify and implement best practices, tools and standards.
Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.
Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and Analytics.
Design and Implement an internal process improvement:An automating manual process, an optimizing data delivery, re-designing infrastructure for greater scalability, An Optimize performance tuning.
Implementing the Data Quality and content Validation by using tools like Spark,Scala,Hive,Nifi.
Involved in creating End-to-End data pipeline within distributed environment using the Big data tools, Sparkframework and Power BI for data visualization.

Confidential, Valley Forge, PA

Big Data Developer

Responsibilities:

Implementing the Proof of Concept (POC) for ETL Abinitio graph concepts which need to be migrated into Spark using scala and python (Pyspark).
Develop Data pipelines using Sqoop, Spark, and Map reduce and Hive to Ingest, transform and analyze customer behavior data.
Developed a data pipeline using Kafka, Spark Streaming and Hive to ingest the data from data lakes to Hadoop distributed file system.
ImplementedSparkusing python andSparkSQL for faster processing of data and algorithms for real time analysis inSpark.
UsedSparkfor interactive queries, processing of streaming data and integration with popular NOSQL database for huge volume of data.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs and Python.
Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
Extracting the Files from the RDBMS (DB2) by using Sqoop into Hadoop file system (HDFS) to process the workflow.
Implementing the Partitioning and bucketing for faster query processing in Hive Query Language (HQL).
Involving in Converting the HIVE/SQL queries into Spark transformation using Spark Data frames, Datasets and User defined functions (UDF's).
Design Hive queries and Pig Script to perform Data Analysis, Data transfer and Table design.
Evaluating the Data between the ETL and Hadoop to ensure Data quality.
Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems.
Testing on Apache Tez Framework and Hadoop Map Reduce frameworks for building high performance batch and interactive Data Processing Applications.
Reconciling the data on daily basis in between the ETL and Hive tables by using a compare tool which is implemented in spark framework using Pyspark.
Fine tune Hadoop applications for high performance and throughput, troubleshoot and debug any Hadoop ecosystem run time issues.
Performing Data validation operation between ETL and Apache Hive tables.
Developing the Linux shell scripting for Deploying and running the migrated Hadoop Application in Production Servers.
Developing Workflows for scheduling and orchestrating the Hadoop Process.

Confidential

SQL Developer

Responsibilities:

Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
Extensively usedSparkstack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.
Developed Real-time data processing applications by using Scala and Python and implemented ApacheSparkStreaming from various streaming sources like Kafka, Flume and JMS.
Replaced the existing Map Reduce programs intoSparkapplication using Scala.
Built on premise data pipelines using Kafka andSparkstreaming using the feed from API streaming Gateway REST service.
Developed the Hive UDF's to handle data quality and create filtered datasets for further processing
Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
Good knowledge on Kafka streams API for data transformation.
Developed oozie workflow for scheduling & orchestrating the ETL process.
Used Talend tool to create workflows for processing data from multiple source systems.
Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
Deployed applications using Jenkins framework integrating Git- version control with it.
Participated in production support on a regular basis to support the Analytics platform
Used Rally for task/bug tracking.
Used GIT for version control.

We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

Virginia Beach, VirginiA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship