We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Fairfax, VA


  • 9 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and big data applications.
  • Over 3+ years of experience in Big Data platform as both Developer and Administrator.
  • Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, SparkStreaming, SparkSQL, Storm, Kafka, Oozieand Cassandra.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
  • Exposure to administrative tasks such as installingHadoopand its ecosystem components such as Hive and Pig
  • Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Worked on all major distributions of HadoopClouderaand Hortonworks.
  • Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with the Spark for improving performance and optimization of the existing algorithms inHadoopusing Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Experience in installation, configuration, Management, supporting and monitoringHadoopcluster using various distributions such as Apache and Cloudera.
  • Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.
  • Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
  • Converted Various Hive queries into Spark transformations and Actions that are required.
  • Experience in working on apacheHadoopopen source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
  • Designed neat and insightful dashboards in Tableau.
  • Have worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographicsegmentation.
  • Deep understanding of Tableau features such as site and serveradministration, Calculatedfields, Tablecalculations, Parameters, Filter’s (Normalandquick), highlighting, Levelofdetail,Granularity, Aggregation, Reference line and many more.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, and SQL Plus.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.


Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase,Spark

Programming Languages: Java (5, 6, 7),Python,Scala

Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell

ETL Tools: Cassandra, HBASE,ELASTIC SEARCH, Alteryx.

Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

Office Tools: MS-Office,MS-Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon EC2

Visualization Tools: Tableau.


Confidential, Fairfax, VA

Sr. Hadoop Developer


  • Worked onHadoopcluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
  • Worked with Puppet for application deployment
  • Configured Kafka to read and write messages from external programs.
  • Configured Kafka to handle real time data.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Developed MapReduce and Spark jobs to discover trends in data usage by users.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Real time streaming the data using Spark with Kafka.
  • Good knowledge on building Apache spark applications using Scala.
  • Developed several business services using Java RESTful WebServices using Spring MVC framework
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Used Apache Oozie for scheduling and managing theHadoopJobs. Knowledge on HCatalog forHadoopbased storage management.
  • Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location inHadoopDistributed File System (HDFS).
  • Implemented test scripts to support test driven development and continuous integration.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
  • Responsible to manage data coming from different sources.
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
  • Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Involved in collecting metrics forHadoopclusters using Ganglia and Ambari.
  • Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
  • Configured Kerberos for the clusters

Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.

Confidential - Atlanta, GA

Hadoop Data Analyst


  • Worked on cloud platform which was built with a scalable distributed data solution using Hadoopon a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
  • Worked on analyzing Hadoopstack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Designing and implementing semi-structured data analytics platform leveraging Hadoop.
  • Worked on performance analysis and improvements for Hive and Pig scripts Confidential MapReduce job tuning level.
  • Installation and Configuration ofHadoopCluster. Working with Cloudera Support Team to Fine tune Cluster. Developed a custom File System plugin forHadoopso it can access files on Hitachi Data Platform.
  • Developed connectors for elastic search and green plum for data transfer from a kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
  • Involved in Optimization of Hive Queries.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Involved in Data Ingestion to HDFS from various data sources.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoopand relational databases.
  • Automated Sqoop, hive and pig jobs using Oozie scheduling.
  • Extensive knowledge in NoSQL databases like HBase
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
  • Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
  • Helped business team by installing and configuring Hadoopecosystem components along with Hadoopadmin.
  • Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
  • Worked on loading log data into HDFS through Flume
  • Created and maintained technical documentation for executing Hive queries and Pig Scripts.
  • Worked on debugging and performance tuning of Hive &Pig jobs.
  • Used Oozie to schedule various jobs on Hadoop cluster.
  • Used Hive to analyses the partitioned and bucketed data.
  • Worked on establishing connectivity between Tableau andHive.

Environment: Hortonworks 2.4, Hadoop, HDFS, Map Reduce, Mongo DB,Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX

Confidential - Beaverton, OR

Hadoop Developer


  • Worked with Business analysts and Product owners to analyze and understand the requirements and giving the estimates.
  • Implement J2EE design patterns such as Singleton, DAO, DTO and MVC.
  • Developed this web application to store all system information in a central location using Spring MVC, JSP, Servlet and HTML.
  • Used SpringAOP module to handle transaction management services for objects in any Spring-based application.
  • Implemented SpringDI and Spring Transactions in business layer.
  • Developed data access components using JDBC, DAOs, and Beans for data manipulation.
  • Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
  • Used iBATIS for dynamically building SQLQueries based on parameters.
  • Developed JavaScript and JQuery functions for all Client side Validations.
  • Developed Junit test cases for Unit Testing &Used Maven as build and configuration tool.
  • Used Shell scripting to create jobs to run on daily basis.
  • Debugged the application using Firebug and traversed through the nodes of the tree using DOM functions.
  • Monitored the error logs using log4j and fixed the problems.
  • Used Eclipse IDE and deployed the application on Web Logic server

Environment: Java, J2EE, Java Script, XML, JavaScript, JDBC, Spring Framework, Hibernate, Rest Full Web services, Web Logic Server, Log4j, JUnit, ANT, SoapUI, Oracle11g.

Confidential - Houston, TX

Java/Hadoop Developer


  • Design and development of Java classes using Object Oriented Methodology.
  • Worked in system using Java, JSP and SERVLET.
  • Development of Java classes and methods for handling Data from database.
  • Experience in sequence data pre-processing, extraction, model fitting and validation using ML pipelines.
  • Uses Talend Open Studio to load files into HadoopHIVE tables and performed ETL aggregations in HadoopHIVE.
  • Used Sqoop to import data from SQL server to Hadoopecosystem.
  • Integration of Cassandra with Talend and automation of jobs.
  • Did Scheduling and monitoring the console outputs through Jenkins.
  • Worked in Agile environment, this uses Jira to maintain the story points.
  • Worked on Implementation of a toolkit that abstracted Solrand Elastic Search.
  • Maintenance and troubleshooting in Cassandra cluster.
  • Installed and configured Hive and written HiveUDFs in java and python
  • Attended and Conducted User meetings for requirement analysis and project reporting.
  • Testing and bug fixing and providing support the production.

Environment: Hadoop, HDFS, Map Reduce, Java, HIVE, Eclipse, Talend, Hive, HBase, Sqoop, Flume, Cassandra, Solr.

Hire Now