We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Nashville, TN

SUMMARY:

  • Around 5 years of professional IT experience in Big data Environment, Hadoop Ecosystem and good experience in Spark, SQL, Java Development.
  • Hands on experience across Hadoop Eco System that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Spark, Sqoop, Hive, Pig, Impala, Oozie, Oozie Coordinator, Zoo - Keeper and Apache Cassandra, HBase.
  • Experience in using various tools like Sqoop, Flume, Kafka, NiFi, Pig to ingest structured, semi-structured and unstructured data into the cluster.
  • D esigning both time driven and data driven automated workflows using Oozie and used Zookeeper for cluster co-ordination .
  • Experience in Hadoop cluster using Cloudera's CDH, Horton works HDP.
  • Experience in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Expertise in writing Map-Reduce Jobs in Java, Python for processing large sets of structured, semi-structured and unstructured data sets and stores them in HDFS.
  • Experience working with Python, UNIX and shell scripting.
  • Experience in Extraction, Transformation and Loading ( ETL ) of data from multiple sources like Flat files and Databases.
  • Good knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
  • Experience with complete Software Development Life Cycle(SDLC) process which includes Requirement Gathering, Analysis, Designing, Developing, Testing, Implementing and Documenting.
  • Worked with waterfall and Agile methodologies.
  • Good team player with excellent communication skills with strong attitude towards learning new technologies.

TECHNICAL SKILLS:

HADOOP: HDFS, MapReduce, Hive, beeline, Sqoop, Flume, Oozie, Impala, pig, Kafka, Zookeeper, NiFi, Cloudera Manager, HortonWorks

Spark Components: Spark Core, Spark SQL (Data Frames and Dataset), Scala, Python.

Programming Languages: Core Java, Scala, Shell, Hive-QL, Python

Web Technologies: HTML, JQuery, Ajax, CSS, JSON, JavaScript.

Operating Systems: Linux, Ubuntu, Windows 10/8/7

Databases: Oracle, MySQL, SQL ServerNoSQL

Databases: Hbase, Cassandra, MongoDB

Cloud: AWS Cloud Formation, Azure

Version controls and Tools: GIT, Maven, SBT, CBT

Methodologies: Agile, Waterfall

IDES & Command Line Tools: Eclipse, Net Beans, IntelliJ

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Nashville, TN

Responsibilities:

  • Worked with product owners, Designers, QA and other engineers in Agile development environment to deliver timely solutions to as per customer requirements.
  • Transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
  • Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the workflows.
  • Involved in creating Hive tables, loading data and writing hive queries, views and worked on them using Hive QL.
  • Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
  • Applied Hive queries to perform data analysis on HBase using the serde tables in meeting the data requirements for the downstream applications.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Implemented MapReduce secondary sorting to get better performance for sorting results in MapReduce programs.
  • Load and transform large sets of structured, semi structured that includes Avro, sequence files.
  • Worked on migration of all existed jobs to Spark, to get performance and decrease time of execution.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Experience with ELK Stack in building quick search and visualization capability for data.
  • Experience with different data formats like Json, Avro, parquet, ORC formats and compressions like snappy & bzip.
  • Coordinated with the testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.

Environment: Hadoop, Big Data, HDFS, Scala, Python, Oozie, Hive, HBase, NiFi, Impala, Spark, AWS, Linux.

Hadoop Developer

Confidential, Hudson, Ohio

Responsibilities:

  • Developed an EDW solution, which is a cloud based EDW and Data Lake that supports Data asset management, Data Integration, and continuous data analytic discovery workloads.
  • Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
  • Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
  • Worked on importing, transforming large sets of structured semi-structured and unstructured data.
  • Used Spark-Structured-Streaming to perform necessary transformations and data model which gets the data from Kafka in real time and Persists into HDFS.
  • Implemented the workflows using the Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table. Created Map side Join, Parallel Execution for optimizing the Hive queries.
  • Developed and implemented hive and spark custom UDFs involving date Transformations such as date formatting and age calculations as per business requirements.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism and modifying the spark default configuration variables for performance tuning.
  • Performed various benchmarking steps to optimize the performance of Spark jobs and thus improve the overall processing.
  • Worked in Agile environment in delivering the agreed user stories with in the sprint time.

Environment: Hadoop, HDFS, Hive, Sqoop, Oozie, Spark, Scala, Kafka, Python, Cloudera, Linux.

Hadoop Developer

Confidential, Bowie, Maryland

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with HortonWorks distribution.
  • Used Sqoop to load the data from relational databases.
  • Involved in converting Hive/SQL queries into spark transformations using Spark RDD’s.
  • Worked with CSV, Jason, Avro and Parquet file formats.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
  • Worked on Kafka to collect and load the data on Hadoop file systems.
  • Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions , Buckets on HIVE tables.
  • Developed and implemented real-time data pipelines with Spark Streaming.
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store HBase for data access and analysis.
  • Worked with Python , to develop analytical jobs using PySpark API of spark.
  • Using Job management scheduler apache Oozie to execute the workflow.
  • Using Ambari to monitor node’s health, status of the jobs and to run the analytics jobs in Hadoop clusters.
  • Experience with pyspark for using spark libraries by using python scripting for data analysis.
  • Worked on Tableau to build customized interactive reports, worksheets, and dashboards.
  • Involved in performance tuning of spark jobs using Cache and by utilizing complete advantage of cluster environment.

Environment: Hadoop, Spark, Scala, Python, Kafka, Hive, Sqoop, Pyspark, Ambari, Oozie, HBase, Tableau, Jenkins, HortonWorks.

Jr Java Developer

Confidential

Responsibilities:

  • Involved in different SDLC phases involving Requirement Gathering, Design and Analysis, Development and Customization of the application.
  • Designed new pages using HTML, CSS, jQuery, and JavaScript.
  • Wrote database queries using SQL and PL/SQL for accessing, manipulating and updating Oracle database.
  • Created database design for new tables and forms with the help of Technical Architect.
  • Worked with managers to identify user needs and troubleshoot issues as they arise.
  • Performing Unit testing, once the basic implementation has done.

Environment: Java, J2EE, Eclipse IDE, JavaScript, JSON, MySQL, PL/SQL, Web service

Hire Now