We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Iowa City, IA


  • 71/2 years of IT industry experience with 5 years of experience in dealing with ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
  • 3 years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
  • Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
  • Very good understanding/knowledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
  • Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
  • Experience in data extraction and transformation using MapReduce jobs.
  • Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
  • Performed data analysis using Hive and Pig.
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Well versed with job workflow scheduling and monitoring tools like Oozie
  • Developed MapReduce jobs to automate transfer of data from HBase.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in using Sqoop, Oozie and Cloudera Manager.
  • Hands on experience in application development using RDBMS, and Linux shell scripting.
  • Have experience with working on Amazon EMR and EC2 Spot instances
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Support development, testing, and operations teams during new system deployments.
  • Solid understanding of relational database concepts.
  • Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
  • Possess excellent communication and analytical skills along with a can - do attitude.


Programming languages: C, C++, Java, Python, Scala, R

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Web/Application Servers: Apache Tomcat, Sun Java Application Server

Tools: IntelliJ, Eclipse, NetBeans, Nagios, Ganglia, Maven

Scripting: BASH, JavaScript

Version Controls: GIT, SVN


Senior Hadoop Developer

Confidential, Iowa City, IA


  • Strong understanding and practical experience in developing Spark applications with Scala.
  • Developed Spark scripts by using Spark shell commands as per the requirement.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD in Spark for Data Aggregation.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
  • Experience in developing SparkSQL applications both using SQL and DSL
  • Extensively worked with parquet file format and gained practical knowledge in writing spark and hive applications to meet the parquet requirements.
  • Experience in using various compression techniques along with Parquet file format.
  • Experience in managing extensive retail datasets from Kroger and gained good experience in creating the test datasets for development purpose
  • Experience in building dimensional and fact tables using Spark Scala applications
  • Practical knowledge on writing applications in Scala to interact with the Hive through the Spark application.
  • Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning.
  • Performed POC on writing the spark applications in Scala, Python and R programming language
  • Good hands on experience with Hive to perform data queries and analysis as a part of the QA
  • Practical experience in using Pig to perform the QA by calculating the statistics of the final output.
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Experience in writing Sqoop scripts to import data from exadata to HDFS
  • Good exposure to MongoDB, it’s functionality and use-cases
  • Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows
  • Performed optimizations and performance tuning in Spark and Hive
  • Developed Unix script to automate data load into HDFS
  • Strong knowledge on HDFS commands to manage the files and also gained good understanding in managing the file system through the Spark Scala applications.
  • Extensive usage of alias for Oozie and HDFS commands
  • Experienced in managing and reviewing Hadoop log files.
  • Experience in log controlling for Spark applications and extensive use of log4j to log the respective phases of the application accordingly
  • Good knowledge on GIT commands, version tagging and pull requests
  • Performed unit testing and also integration testing after the development and participated in code reviews.
  • Experience in writing the Junit test cases for testing the Spark and SparkSQL applications
  • Practical experience with developing applications in IntelliJ and Maven
  • Good exposure to Agile environment. Participated in daily standups, Big Room Planning, Sprint meetings and Team Retrospectives
  • Interact with business analysts to understand the business requirements and translate them to technical requirements

Environment: Hadoop 2.6.0-cdh5.7.0, Java 1.8.0 92, Spark 1.6.0, SparkSQL, R programming, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Oracle 11g/10g, Log4j, Linux, Agile development

Senior Hadoop Developer

Confidential, Pasadena, CA


  • Involved in the review of functional and non-functional requirements.
  • Practical experience in developing Spark applications in Eclipse with Maven.
  • Strong understanding of Spark real time streaming and SparkSQL.
  • Loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Firm understanding of optimizations and performance-tuning practices while working with Spark.
  • Good knowledge on compression and serialization to improve performance in Spark applications
  • Performed interactive querying using SparkSQL.
  • Practical knowledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
  • Good knowledge on building predictive models focusing on customer service using R programming.
  • Have practical knowledge on implementing Internet of Things (IoT)
  • Experience in reviewing and managing Hadoop log files.
  • Used the libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
  • Used Cassandra Query Language to design Cassandra database and tables with various configuration options.
  • Debug CQL queries and implement performance enhancement practices.
  • Strong knowledge on Apache Oozie for scheduling the tasks.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Experience in configuring Kafka brokers, consumers and producers for optimal performance.
  • Knowledge of creating Apache Kafka consumers and producers in Java.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Experience with GIT for version control system.
  • Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
  • Understanding technical specifications and documenting technical design documents.
  • Strong skills in Agile development and Test-Driven development.

Environment: Hadoop Cloudera Distribution(CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, MLib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Senior Hadoop Developer

Confidential, Austin, TX


  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Good knowledge on implementing Image processing with Spark
  • Experience in building batch and streaming applications with Apache Spark and Python.
  • Experience in tackling parallel computing to support the Spark Machine Learning Applications.
  • Experience in deploying machine learning algorithms and models and scale them for real-time events.
  • Experienced in running Apache Pig Scripts to convert XML data to JSON data.
  • Used Scala extensively for the processing and for extracting the images.
  • Good knowledge on Dimensionality Reduction techniques in Mlib in Scala and Java
  • Understanding of matPlotlib library for displaying images and experience in extracting images as vectors.
  • Experience with Java Abstract Window Toolkit (AWT) which is used for basic image processing functions.
  • Strong understanding of mapping, search queries, filters and validating queries in ElasticSearch application.
  • Practical experience in defining queries on JSON data using Query DSL provided by ElasticSearch.
  • Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Experience in optimizing an Hbase cluster using different Hadoop and Hbase parameters.
  • Good knowledge on Hbase data model and its operations along with various troubleshooting and maintenance techniques
  • Good understanding of data storage, replication, data scanning and data filtration in Hbase.
  • Experience in reading from and writing data to Amazon S3 in Spark Applications.
  • Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
  • Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
  • Good understanding of the internals of Kafka design, message compression and replication.
  • Experience in maintaining and operating Kafka and monitor it consistently and effectively using cluster management tools.
  • Experience in integrating Kafka with other tools for logging and packaging.
  • Experience in transferring data between HDFS and RDBMS using Sqoop.
  • Knowledge on adding and describing a third-party connector in Sqoop
  • Knowledge on incremental import, free-form query import, export and Hadoop ecosystem integration using Sqoop.
  • Run machine learning Spark jobs on Hadoop using Oozie and create quick Oozie jobs using Hue.
  • Schedule Sqoop jobs through Oozie to import data from database to HDFS.

Environment: Amazon Web Services, Java 7, Hadoop 2.4.0, Spark, MLib, Python, Scala, Hbase, ElasticSearch, Apache Pig 0.12.0, Apache Hive 0.13.0, MapReduce, HDFS, Sqoop, Oozie, Kafka, Zookeeper, Maven, Eclipse, Nagios, Ganglia, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Hadoop Developer

Confidential, NJ


  • Developed several advanced Map Reduce programs to process data files received
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
  • Experience in implementing joins in the analysis of dataset to discover interesting relationships.
  • Completely involved in the requirement analysis phase.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
  • Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
  • Experience in writing cron jobs to run at regular intervals.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.

Java Developer



  • Involved in Analysis, Design, Implementation and Bug Fixing Activities.
  • Designing the initial Web-WAP pages for a better UI as per the requirement.
  • Involved in Functional & Technical Specification documents review and the code review.
  • Undergone training on the Domain Knowledge.
  • Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
  • Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
  • DevelopedSQL queriesandStored ProceduresusingPL/SQLto retrieve and insert into multiple database schemas.
  • Prepared the Support Guide containing the complete functionality.

Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

Java Developer



  • Requirements Study, Software Development Specification, Development and Unit Testing use of Junit.
  • DevelopedSQL queriesandStored ProceduresusingPL/SQLto retrieve and insert into multiple database schemas.
  • Involved in unit testing.
  • Analyzing the Business Requirements and timely deliveries.
  • Resolving the Bugs raised by the client.
  • Involved in deployment activities on Web Sphere Application Server.
  • Developed Test Cases for Unit Testing and Integration testing.
  • Generating reports preparing system maintenance documentation.
  • Written custom JavaScript functions for ready to use purpose for field validation.

Environment: Core Java, ApacheTomcat 5.1, Oracle 9i, Java Script, SQL Developer, JavaScript, HTML, PL/SQL, Rational Rose Windows XP, UNIX.

Hire Now