Hadoop Developer Resume Nashville, TN - Hire IT People

SUMMARY:

Around 5 years of professional IT experience in Big data Environment, Hadoop Ecosystem and good experience in Spark, SQL, Java Development.
Hands on experience across Hadoop Eco System that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Spark, Sqoop, Hive, Pig, Impala, Oozie, Oozie Coordinator, Zoo - Keeper and Apache Cassandra, HBase.
Experience in using various tools like Sqoop, Flume, Kafka, NiFi, Pig to ingest structured, semi-structured and unstructured data into the cluster.
D esigning both time driven and data driven automated workflows using Oozie and used Zookeeper for cluster co-ordination .
Experience in Hadoop cluster using Cloudera's CDH, Horton works HDP.
Experience in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Expertise in writing Map-Reduce Jobs in Java, Python for processing large sets of structured, semi-structured and unstructured data sets and stores them in HDFS.
Experience working with Python, UNIX and shell scripting.
Experience in Extraction, Transformation and Loading ( ETL ) of data from multiple sources like Flat files and Databases.
Good knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
Experience with complete Software Development Life Cycle(SDLC) process which includes Requirement Gathering, Analysis, Designing, Developing, Testing, Implementing and Documenting.
Worked with waterfall and Agile methodologies.
Good team player with excellent communication skills with strong attitude towards learning new technologies.

TECHNICAL SKILLS:

HADOOP: HDFS, MapReduce, Hive, beeline, Sqoop, Flume, Oozie, Impala, pig, Kafka, Zookeeper, NiFi, Cloudera Manager, HortonWorks

Spark Components: Spark Core, Spark SQL (Data Frames and Dataset), Scala, Python.

Programming Languages: Core Java, Scala, Shell, Hive-QL, Python

Web Technologies: HTML, JQuery, Ajax, CSS, JSON, JavaScript.

Operating Systems: Linux, Ubuntu, Windows 10/8/7

Databases: Oracle, MySQL, SQL ServerNoSQL

Databases: Hbase, Cassandra, MongoDB

Cloud: AWS Cloud Formation, Azure

Version controls and Tools: GIT, Maven, SBT, CBT

Methodologies: Agile, Waterfall

IDES & Command Line Tools: Eclipse, Net Beans, IntelliJ

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Nashville, TN

Responsibilities:

Worked with product owners, Designers, QA and other engineers in Agile development environment to deliver timely solutions to as per customer requirements.
Transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the workflows.
Involved in creating Hive tables, loading data and writing hive queries, views and worked on them using Hive QL.
Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
Applied Hive queries to perform data analysis on HBase using the serde tables in meeting the data requirements for the downstream applications.
Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
Implemented MapReduce secondary sorting to get better performance for sorting results in MapReduce programs.
Load and transform large sets of structured, semi structured that includes Avro, sequence files.
Worked on migration of all existed jobs to Spark, to get performance and decrease time of execution.
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Experience with ELK Stack in building quick search and visualization capability for data.
Experience with different data formats like Json, Avro, parquet, ORC formats and compressions like snappy & bzip.
Coordinated with the testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.

Environment: Hadoop, Big Data, HDFS, Scala, Python, Oozie, Hive, HBase, NiFi, Impala, Spark, AWS, Linux.

Hadoop Developer

Confidential, Hudson, Ohio

Responsibilities:

Developed an EDW solution, which is a cloud based EDW and Data Lake that supports Data asset management, Data Integration, and continuous data analytic discovery workloads.
Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
Worked on importing, transforming large sets of structured semi-structured and unstructured data.
Used Spark-Structured-Streaming to perform necessary transformations and data model which gets the data from Kafka in real time and Persists into HDFS.
Implemented the workflows using the Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
Created various hive external tables, staging tables and joined the tables as per the requirement.
Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table. Created Map side Join, Parallel Execution for optimizing the Hive queries.
Developed and implemented hive and spark custom UDFs involving date Transformations such as date formatting and age calculations as per business requirements.
Written Programs in Spark using Scala and Python for Data quality check.
Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism and modifying the spark default configuration variables for performance tuning.
Performed various benchmarking steps to optimize the performance of Spark jobs and thus improve the overall processing.
Worked in Agile environment in delivering the agreed user stories with in the sprint time.

Environment: Hadoop, HDFS, Hive, Sqoop, Oozie, Spark, Scala, Kafka, Python, Cloudera, Linux.

Hadoop Developer

Confidential, Bowie, Maryland

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop cluster environment with HortonWorks distribution.
Used Sqoop to load the data from relational databases.
Involved in converting Hive/SQL queries into spark transformations using Spark RDD’s.
Worked with CSV, Jason, Avro and Parquet file formats.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
Worked on Kafka to collect and load the data on Hadoop file systems.
Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions , Buckets on HIVE tables.
Developed and implemented real-time data pipelines with Spark Streaming.
Designed, developed data integration programs in a Hadoop environment with NoSQL data store HBase for data access and analysis.
Worked with Python , to develop analytical jobs using PySpark API of spark.
Using Job management scheduler apache Oozie to execute the workflow.
Using Ambari to monitor node’s health, status of the jobs and to run the analytics jobs in Hadoop clusters.
Experience with pyspark for using spark libraries by using python scripting for data analysis.
Worked on Tableau to build customized interactive reports, worksheets, and dashboards.
Involved in performance tuning of spark jobs using Cache and by utilizing complete advantage of cluster environment.

Environment: Hadoop, Spark, Scala, Python, Kafka, Hive, Sqoop, Pyspark, Ambari, Oozie, HBase, Tableau, Jenkins, HortonWorks.

Jr Java Developer

Confidential

Responsibilities:

Involved in different SDLC phases involving Requirement Gathering, Design and Analysis, Development and Customization of the application.
Designed new pages using HTML, CSS, jQuery, and JavaScript.
Wrote database queries using SQL and PL/SQL for accessing, manipulating and updating Oracle database.
Created database design for new tables and forms with the help of Technical Architect.
Worked with managers to identify user needs and troubleshoot issues as they arise.
Performing Unit testing, once the basic implementation has done.

Environment: Java, J2EE, Eclipse IDE, JavaScript, JSON, MySQL, PL/SQL, Web service

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Nashville, TN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship