We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

IllinoiS

SUMMARY

  • Over 5 years of overall IT Industry and Software Development experience wif experience in Hadoop Development
  • Configured Spark streaming to receive real - time data from teh Kafka and stored teh stream data to HDFS using Scala and Python.
  • Experienced in importing and exporting data using stream processing using Flume, Kafka and Python
  • Written Hive UDFs as required and executed complex HQLs to extract data from Hive tables
  • Used partitioning and bucketing in Hive and designed both managed and external tables for performance optimization
  • Converted Hive/SQL queries into Spark transformations using Spark Data frames and Scala
  • Experienced in Python cluster to manipulate data for data loading and extraction. Worked wif Python libraries like Matplotlib, NumPy, SciPy and Pandas for data analysis.
  • Automated recurring reports using SQL and Python and visualized them on BI platform.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data
  • Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra
  • Experienced in workflow scheduling and locking tools/services like Oozie and Zookeeper
  • Practiced ETL methods in enterprise-wide solutions, data warehousing, reporting and data analysis
  • Experienced in working wif AWS using EMR, EC2 for computing and S3 as storage mechanism, Spark, Oozie, Zookeeper, Kafka and Flume.Configured Spark streaming to receive real-time data from teh Kafka and stored teh stream data to HDFS using Scala and Python.
  • Experienced in importing and exporting data using stream processing using Flume, Kafka and Python
  • Written Hive UDFs as required and executed complex HQLs to extract data from Hive tables
  • Used partitioning and bucketing in Hive and designed both managed and external tables for performance optimization
  • Converted Hive/SQL queries into Spark transformations using Spark Data frames and Scala
  • Experienced in Python cluster to manipulate data for data loading and extraction. Worked wif Python libraries like Matplotlib, NumPy, SciPy and Pandas for data analysis.
  • Automated recurring reports using SQL and Python and visualized them on BI platform.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data
  • Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra
  • Experienced in workflow scheduling and locking tools/services like Oozie and Zookeeper
  • Practiced ETL methods in enterprise-wide solutions, data warehousing, reporting and data analysis
  • Experienced in working wif AWS using EMR, EC2 for computing and S3 as storage mechanism
  • Developed Impala scripts for extraction, transformation, loading of data into data warehouse
  • Good knowledge in using apache NiFi to automate teh data movement between Hadoop systems
  • Imported and exported data wif Sqoop to and from HDFS to RDBMS including Oracle, MySQL and c
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks
  • Experienced in using IDEs like Eclipse, NetBeans, IntelliJ.
  • Used JIRA and Rally for bug tracking and GitHub and SVN for various code reviews and unit testing
  • Experienced in working in all phases of SDLC - both agile and waterfall methodologies
  • Good understanding of Agile Scrum methodology, Test Driven Development and CI-CD

PROFESSIONAL EXPERIENCE

HADOOP DEVELOPER

Confidential, Illinois

Responsibilities:

  • Developed architecture document, process documentation, server diagrams, requisition documents
  • Developed stream pipelines and consumed real time events from Kafka using Kafka streams API and Kafka clients.
  • Configured Spark streaming to get incoming messages from Kafka topics and store teh stream data in to HDFS.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked wif Senior Engineer on configuring Kafka for streaming data.
  • Developed Spark jobs by using Scala as per teh requirement.
  • Performed processing on large sets of structured, unstructured and semi structured data.
  • Created applications using Kafka, which monitors consumer lag wifin Apache Kafka clusters.
  • Handled importing of data from various data sources using Sqoop, performed transformations using Spark and loaded data into DynamoDB.
  • Analyzed teh data by performing Hive queries and use visualization tools to generate insights from data and analyze customer behavior.
  • Worked wif Spark Ecosystem using Scala and SparkSQL Queries on different data formats like Text file and parquet.
  • Used Hive UDF's to implement business logic in Hadoop.
  • Implemented business logic by writing UDFs in Python and used various UDFs.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Used Spark to store data in-memory.
  • Implemented batch processing of data sources using Apache Spark.
  • Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif HiveQL.
  • Involved in creating Hive tables, loading data and writing hive queries which will run internally as MapReduce job.
  • Develop predictive analytic using Apache Spark Scala APIs
  • Cluster co-ordination services through Zookeeper.
  • Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Used Docker as part of CI/CD to build and deploy applications using ECS in AWS.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.

Environment: Hadoop, Map Reduce, HDFS, Hive, Java, Oozie, Linux, XML, Java 6, Eclipse, Oracle 10g, PL/SQL, YARN, Spark, Pig, Sqoop, DB2, java, XML, UNIX, HCatalog.

HADOOP DEVELOPER

Confidential, New York City, New York

Responsibilities:

  • Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
  • Involved in importing teh data from various data sources into HDFS using Sqoop and applying various transformations using Hive, apache Spark and tan loading data into Hive tables or AWS S3 buckets.
  • Involved in moving data from various DB2 tables to AWS S3 buckets using Sqoop process.
  • Configuring Splunk alerts in-order to get teh log files while execution and storing them to a location in S3 bucket when cluster is running.
  • Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python (spark).
  • Configured Spark Streaming to receive real time data from teh Apache Kafka and store teh stream data to DynamoDB using Scala.
  • Experienced in creating teh EMR cluster and deploying code into teh cluster in S3 buckets.
  • Experienced in using NoMachine and Putty in-order to SSH teh EMR cluster and running spark-submit.
  • Developed Apache Spark Applications by using Scala, Python and Implemented Apache Spark data processing module to handle data from various RDBMS and Streaming sources.
  • Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark) and Scala.
  • Developing spark code using pyspark to apply various transformations and actions for faster data processing.
  • Achieved high-throughput, scalable, fault-tolerant stream processing of live data streams using Apache Spark Streaming
  • Used Spark Stream processing using Scala to get data into in-memory, created RDDs, Data Frames and applied transformations and actions.
  • Involved in using various Python libraries wif spark in order to create data frames and store them to Hive.
  • Sqoop jobs and Hive queries were created for data ingestion from relational databases to analyze historical data.
  • Experience in working wif Elastic MapReduce (EMR) and setting up environments on amazon AWS EC2 instances.
  • Knowledge on handling Hive queries using Spark SQL dat integrates wif Spark environment.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, stored in S3 Buckets.
  • Knowledge on creating teh user defined functions (UDF's) in Hive.
  • Worked wif different File Formats like c, Avro, parquet for HIVE querying and processing based on business logic.
  • Involved in pulling teh data from AWS Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Hive UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
  • Involved in loading teh structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in developing code and generated various data frames based on teh business requirement and created temporary tables in hive.
  • Experience in build scripts using SBT and did continuous system integrations like Bamboo.
  • Used JIRA for creating teh user stories and creating branches in teh bitbucket repositories based on teh story.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Used Bitbucket as a repository for storing teh code and integrated wif bamboo for integration purpose.
  • Involved in Test Driven Development writing unit and integration test cases for teh code.

Environment: Hadoop, Cloudera Hadoop, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, JSON, Spark, HDFS, YARN, Oozie Scheduler, Zookeeper, Mahout, Linux, UNIX, ETL, My SQL.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Developed stream pipelines and consumed real time events from Kafka using Kafka streams API and Kafka clients.
  • Configured Spark streaming to get incoming messages from Kafka topics and store teh stream data in to HDFS.
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Involved in using spark streaming and SPARK jobs for ongoing transactions of customers and Spark SQL to handle structured data in Hive.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Written PySpark code to calculate aggregate data like mean, Co-Variance, Standard Deviation and etc.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Written UDF in Scala and used in sampling of large data sets.
  • Used distinctive data formats while stacking teh data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Worked on NiFi workflows development for data ingestion from multiple sources. Involved in architecture and design discussions wif teh technical team and interface wif other teams to create efficient and consistent Solutions.
  • Involved in creating Shell scripts to simplify teh execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move teh data inside and outside of HDFS.
  • Creating files and tuned teh SQL queries in Hive utilizing HUE.
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
  • Experienced in working wif spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked wif NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Developed workflow in Oozie to automate teh jobs.
  • Provided design recommendations and thought leadership to sponsors/stakeholders dat improved review process and resolved technical problems
  • Developed complete end to end Big-Data Processing in Hadoop Ecosystems.

We'd love your feedback!