Azure Data Engineer Resume Tampa, FL - Hire IT People

SUMMARY

Having around 8 years of total IT experience with over 5 years experience in Big Data Hadoop experience in Development and Design of Java based enterprise applications.
Extensive working experience on Hadoop Eco - system components like Hadoop, HDFS, MapReduce, Hive, Sqoop, Flume, Spark, Kafka, Oozie and Zookeeper.
Implemented performance tuning techniques for Spark-SQL queries.
Strong knowledge on Hadoop HDFS architecture, Map-Reduce(MRv1) and YARN(MRv2) framework.
Strong hands on Experience in publishing the messages to various Kafka topics using Apache NIFI and consuming the message to HBase using Spark and Python.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.
Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.
Worked on creating Spark jobs that process the true source files and successful in performing various transformations on the source data using Spark Data frame, Spark SQL API's.
Developed Sqoop scripts to migrate data from Teradata, Oracle to Bigdata Environment.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
Implemented real time data streaming pipeline usingAWS Kinesis, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets.
Work on large scale data transfer across different Hadoop clusters, implement new technology stacks on Hadoop clusters using Apache Spark.
Added support for AWSS3 and RDS to host static/media files and the database into Amazon Cloud.
Experience in project deployment using Heroku/Jenkins and using web services like Amazon Web Services (AWS) EC2, AWS S3, Autoscaling, CloudWatch and SNS.
Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Hands on experience in analyzing log files for Hadoop and eco-system services and finding root cause.
Hands on experience on handling different file formats like AVRO, PARQUET, Sequential files, MAP Files, CSV, xml, log ORC and RC.
Experience with NoSQL Database HBase, Cassandra, MongoDB.
Experience with AIX/Linux RHEL, Unix Shell Scripting and SQL Server 2008.
Worked on data search tool Elastic Search and data collection tool Logstash.
Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.
Experience in developing stored procedures, triggers using SQL, PL/SQL in relational databases such as MS SQL Server 2005/2008.
Exposed into methodologies Scrum, Agile and Waterfall.

TECHNICAL SKILLS

Programming Languages: Java, Python, SQL, and C/C++

Big Data Ecosystem: Hadoop, MapReduce, Kafka, Spark, Pig, Hive, YARN, Flume, Sqoop, Oozie, Zookeeper, Talend.

Hadoop Distributions: Cloudera Enterprise, Horton Works, EMC Pivotal.

Databases: Oracle, SQL Server, PostgreSQL.

Web Technologies: HTML, XML, JQuery, Ajax, CSS, JavaScript, JSON.

Streaming Tools: Kafka

Testing: Hadoop Testing, Hive Testing, MRUnit.

Operating Systems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.

Cloud: AWS EMR, Glue, RDS, CloudWatch, S3, Redshift Cluster, Kinesis, DynamoDB.

Technologies and Tools: Servlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub, Bamboo.

Application Servers: Tomcat, JBoss.

IDE’s: Eclipse, Net Beans, IntelliJ.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Azure Data Engineer

Responsibilities:

Build Data Pipleline Architecture on Azure Cloud platform using NiFi, Azure DataLake Storage Service, Azure HD Insight, Airflow and Data Engineer tool.
Designed and developed scalable and cost-effective architecture in Azure Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.
Engage with business users to gather requirements, design visualizations and provide training to use self-service BI tools.
Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
Develop conceptual solutions & create proof-of-concepts to demonstrate viability of solutions.
Technically guide projects through to completion within target timeframes.
Collaborate with application architects and DevOps.
Identify and implement best practices, tools and standards.
Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.
Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and Analytics.
Design and Implement an internal process improvement:An automating manual process, an optimizing data delivery, re-designing infrastructure for greater scalability, An Optimize performance tuning.
Implementing the Data Quality and content Validation by using tools like Spark,Scala,Hive,Nifi.
Involved in creating End-to-End data pipeline within distributed environment using the Big data tools, Sparkframework and Power BI for data visualization.

Confidential, Washington.

Big Data Developer

Responsibilities:

Implementing the Proof of Concept (POC) for ETL Abinitio graph concepts which need to be migrated into Spark using scala and python (Pyspark).
Develop Data pipelines using Sqoop, Spark, Map reduce and Hive to Ingest, transform and analyze customer behavior data.
Developed a data pipeline using Kafka, Spark Streaming and Hive to ingest the data from data lakes to Hadoop distributed file system.
ImplementedSparkusing python andSparkSQL for faster processing of data and algorithms for real time analysis inSpark.
UsedSparkfor interactive queries, processing of streaming data and integration with popular NOSQL database for huge volume of data.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs and Python.
Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
Extracting the Files from the RDBMS(DB2) by using Sqoop into Hadoop file system (HDFS) to process the workflow.
Implementing the Partitioning and bucketing for faster query processing in Hive Query Language (HQL).
Involving in Converting the HIVE/SQL queries into Spark transformation using Spark Data frames, Datasets and User defined functions (UDF's).
Design Hive queries and Pig Script to perform Data Analysis, Data transfer and Table design.
Evaluating the Data between the ETL and Hadoop to ensure Data quality.
Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems.
Testing on Apache Tez Framework and Hadoop Map Reduce frameworks for building high performance batch and interactive Data Processing Applications.
Reconciling the data on daily basis in between the ETL and Hive tables by using a compare tool which is implemented in spark framework using Pyspark.
Fine tune Hadoop applications for high performance and throughput, troubleshoot and debug any Hadoop ecosystem run time issues.
Performing Data validation operation between ETL and Apache Hive tables.
Developing the Linux shell scripting for Deploying and running the migrated Hadoop Application in Production Servers.
Developing Workflows for scheduling and orchestrating the Hadoop Process.

Confidential, Greenwood Village, CO.

Hadoop Developer

Responsibilities:

Designed and developed scalable and cost-effective architecture in AWS Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.
Involved in creating End-to-End data pipeline within distributed environment using the Big data tools, Spark framework and Tableau for data visualization.
Ensure that application continues to function normally through software maintenance and testing in production environment.
Leverage Spark features such as In-Memory processing, Distributed Cache, Broadcast, Accumulators, Map side Joins to implement data preprocessing pipelines with minimal latency.
Implemented real-time solutions for Money Movement and transactional data using Kafka, Spark Streaming, Hbase.
The project also includes a spread of big data tools and programming languages like Sqoop, Python, Oozie etc.
Worked on scheduling Oozie workflow engine to run multiple jobs.
Experience in creating python topology script to generate cloud formation template for creating the EMR cluster in AWS.
Good knowledge on AWS Services like EC2, EMR, S3, Service Catalog, and Cloud Watch.
Experience in using SparkSQL to handle structured data from Hive in AWS EMR Platform (M4.Xlarge,M5.12Xlarge clusters).
Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Experienced in optimizing Hive queries, joins to handle different data sets.
Involved in creating Hive tables (Managed tables and External tables), loading and analyzing data using hive queries.
Actively involved in code review and bug fixing for improving the performance.
Good experience in handling data manipulation using python Scripts.
Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
Created Splunk dashboard to capture the logs for end to end process of data ingestion.
Written unit test cases for Pyspark code for CICD process.
Good knowledge about the configuration management tools like BitBucket/Github and Bamboo(CICD).

Environment: Hadoop, Cloudera, Flume, HBase, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, agile methodologies, UNIX

Confidential, New York.

Hadoop Developer

Responsibilities:

Developed data pipeline using Sqoop,Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
Implemented Sparkusing python and SparkSQL for faster processing of data and algorithms for real time analysis inSpark.
Used Spark for interactive queries, processing of streaming data and integration with popular NOSQL database for huge volume of data.
Used the Spark- Cassandra Connector to load data to and from Cassandra. Real time streaming the data usingSparkwith Kafka.
Developing Kafka producers and consumers in java and integrating with apache storm and ingesting data into HDFS and HBase by implementing the rules in storm.
Develop efficient MapReduce programs in Python to perform batch processes on huge unstructured datasets.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs and Python.
Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
Created HBase tables and column families to store the user event data and wrote automated HBase test cases for data quality checks using HBase command line tools.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Scheduled and executed workflows in Oozie to run Hive and Pig jobs and created UDF's to store specialized data structures in HBase and Cassandra.
Develop NiFi workflow to pick up the multiple retail files from ftp location and move those to HDFS on daily basis.
Worked withdeveloperteams on Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi and importing data using Nifi tool from Linux servers.
Developed product profiles using Pig and commodity UDFs & developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Optimizing existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frames and Pair RDD's.
TunedSpark/Python code to improve the performance of machine learning algorithms for data analysis.
Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
Developed interactive shell scripts for scheduling various data cleansing and data loading process.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Teradata concepts were used for the early instance creation with the DBMS concepts.

Environment: Hadoop, MapReduce, Yarn,Spark, Hive, Pig, Kafka, HBase, Oozie, Sqoop, Python, Bash/Shell Scripting, Flume, HBase, Cassandra, Oracle, Core Java, Storm, HDFS, Unix, Teradata, NiFi, Eclipse

Confidential

SQL Developer

Responsibilities:

Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
Extensively used Sparkstack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.
Developed Realtime data processing applications by using Scala and Python and implemented ApacheSparkStreaming from various streaming sources like Kafka, Flume and JMS.
Worked on extracting and enriching HBase data between multiple tables using joins inSpark.
Worked on writing APIs to load the processed data to HBase tables.
Replaced the existing MapReduce programs into Spark application using Scala.
Built on-premise data pipelines using Kafka andSparkstreaming using the feed from API streaming Gateway REST service.
Developed the Hive UDF's to handle data quality and create filtered datasets for further processing
Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
Good knowledge on Kafka streams API for data transformation.
Developed oozie workflow for scheduling & orchestrating the ETL process.
Used Talend tool to create workflows for processing data from multiple source systems.
Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
Deployed applications using Jenkins framework integrating Git- version control with it.
Participated in production support on a regular basis to support the Analytics platform
Used Rally for task/bug tracking.
Used GIT for version control.

We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

Tampa, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship