We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Herndon, VA


  • Around 8+ years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
  • 4+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, HortonworksHadoop distributions.
  • Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
  • Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
  • Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used pyspark and spark - shell accordingly.
  • Experience in configuringSpark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
  • Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Used Spark Data Frame Operations to perform required Validations in the data.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Good understanding and knowledge of NoSQL databases like MongoDB, Hbase and Cassandra.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experience in relational databases like Oracle, MySQL and SQL Server.
  • Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like GIT, SVN.
  • Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
  • Experience in Java development GUI using JFC, Swing, JavaBeans, and AWT.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Experienced in working in SDLC, Agile and Waterfall Methodologies.
  • Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.Ability to quickly adapt new environment and technologies.


Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Languages: Java, Python, Scala

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJS, ExtJS and JSON

No SQL Databases: Cassandra, MongoDB and HBase

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

ETL Tools: Talend, Informatica

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac os and Windows Variants

Data analytical tools: R and MATLAB

ETL Tools: Talend, Informatica, Pentaho


Confidential, Herndon, VA

Sr. Hadoop Developer


  • Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD’s, Spark YARN.
  • Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Experienced to implement Hortonworks distribution system.
  • Creating Hive tables and working on them for data analysis to cope up with the requirements.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Experienced in working with Elastic MapReduce(EMR).
  • Developed Map Reduce programs for some refined queries on big data.
  • In-depth understanding of classic MapReduce and YARN architecture.
  • Worked with business team in creating Hive queried for ad hoc access.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Implemented Hive Generic UDF’s to implement business logic.
  • Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
  • Installed and configured Pig for ETL jobs.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Performed data integration with a goal of moving more data effectively, efficiently and with high performance to assist in business-critical projects using Talend Data Integration.
  • Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Built a data flow pipeline using flume, Java (MapReduce) and Pig.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experience in using version control tools like GITHUB to share the code snippet among the team members.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential, Green, OH

Hadoop Developer


  • Good in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Used Spark RDD for faster Data sharing.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Hive and MongoDB.
  • Wrote XML scripts to build Oozie functionality.
  • Experience in workflow Scheduler Oozie to manage and schedule job on Hadoop cluster for generating reports on Day and weekly basis.
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented custom serializer, interceptor, source and sink in Flume to ingest data from multiple sources.
  • Involved in writing query using Impala for better and faster processing of data.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Programmed pig scripts with complex joins like replicated and skewed to achieve better performance.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Designing & creating ETL jobs through Talend to load huge volumes of data into MongoDB, Hadoop Ecosystem and relational databases.
  • Created Talend jobs to connect to Quality Stage using FTP connection and process data received from Quality Stage.
  • Migrated data from MySQL server to Hadoop using Sqoop for processing data.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experienced in developing Shell scripts and Python scripts for system management.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: CDH 3.x and 4.x, Java, Hadoop, Python, Map Reduce, Hive, Pig, Impala, Flume, MongoDB, Sqoop, Talend, Spark, MySQL,AWS.

Confidential - San Francisco, CA

Hadoop Developer


  • Worked on migrating MapReduce programs into Spark transformations using Scala. Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance by adding and removing cluster nodes. Cluster monitoring, troubleshooting, managing and reviewing data backups and log files.
  • Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
  • Collected and aggregated large amounts of stream data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Analyzed data using Hadoop components Hive and Pig.
  • Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work with sequence files.
  • Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently with time and data availability.
  • Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
  • Generated reports using QlikView.
  • Wrote several Hive queries to get valuable information from the hidden large datasets.
  • Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Imported data from Teradatadatabase into HDFS and exported the analyzed patterns data back to Teradata using Sqoop.
  • Worked with Talend Open Studio to perform ETL jobs.

Environment: Hadoop(Hortonworks), HDFS, Hive, Pig, Sqoop, Map Reduce, HBase, Shell Scripting, QlikView, Teradata 14, Oozie, Java 7, Maven 3.x.

Confidential - Bethesda, MD

Java/Hadoop Developer


  • Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted with performance tuning and monitoring.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Worked on creating MapReduceprograms to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs.
  • Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development.
  • Collaborated with BI teams to ensure data quality and availability with live visualization.
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
  • Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
  • Performed test run of the module components to understand the productivity.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase and Pig in team.
  • Shared the knowledge of Hadoop concepts with team members.
  • Used JUnit for unit testing and Continuum for integration testing.

Environment: Cloudera, Hadoop, Pig, Sqoop, Hive, HBase, Java, Eclipse, MySQL, MapReduce.


Java Developer


  • Responsible for the analyzing, documenting the requirements, designing and developing the application based on J2EE standards. Strictly Followed Test Driven Development.
  • Used Microsoft Visio for designing use cases like Class Diagrams, Sequence Diagrams, and Data Models.
  • Extensively developed user interface using HTML, JavaScript, jQuery, AJAX and CSSon the front end.
  • Designed Rich Internet Application by implementing jQuery based accordion styles.
  • Used JavaScript for the client-side web page validation.
  • Used Spring MVC and Dependency Injection for handling presentation and business logic. Integrated Spring DAO for data access using Hibernate.
  • Developed Struts web forms and actions for validation of user request data and application functionality.
  • Developed programs for accessing the database using JDBC thin driver to execute queries, prepared statements, Stored Procedures and to manipulate the data in the database.
  • Created tile definitions, Struts configuration files, validation files and resource bundles for all modules using Struts framework.
  • Involved in the coding and integration of several business-critical modules using Java, JSF,and Hibernate.
  • Developed SOAP-based web services for communication between its upstream applications.
  • Implemented different Design patterns like DAO, Singleton Pattern and MVC architectural design pattern of spring.
  • Implemented Service Oriented Architecture (SOA) on Enterprise Service Bus (ESB).
  • Developed Message-Driven Beans for asynchronous processing of alerts using JMS.
  • Implemented Rational Rose tool for application development.
  • Used Clear case for source code control and JUnit for unit testing.
  • Performed integration testing of the modules.
  • Used putty for UNIX login to run the batch jobs and check server logs.
  • Deployed application on to Glassfish Server.
  • Involved in peer code reviews.

Environment: Java 6,7, J2EE, Struts 2, Glassfish, JSP, JDBC, EJB, ANT, XML, IBM Web Sphere, JUnit, IBM DB2, Rational Rose 7, CVS, UNIX, SOAP, SQL, PL/SQL.


Java Developer


  • Documented functional and technical requirements, wrote Technical Design Documents.
  • Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
  • Developed presentation layer components comprising of JSP, AJAX, Servlets and JavaBeans using the Struts framework.
  • Implemented MVC (Model View Controller) architecture.
  • Developed XML configuration and data description using Hibernate.
  • Developed Web services usingCXF to interact with Mainframe applications.
  • Responsible for the deployment of the application in the development environment using BEA WebLogic 9.0 application server.
  • Participated in the configuration of BEA WebLogic application server.
  • Designed and developed front end user interface using HTML and Java Server Pages (JSP) for customer profile setup.
  • Developed ANT Script to compile the Java files and to build the jars and wars.
  • Responsible for Analysis, Coding and Unit Testing and Production Support.
  • Used JUnit for testing Modules.

Environment: Java 1.6, J2EE, JDBC, Struts Framework, Hibernate, Servlets, MVC, JSP, Web Services, CXF, SOAP, BEA WebLogic 9, Oracle 9i, JavaScript, XML, HTML, Ant, JUnit, SVN, My Eclipse.

Hire Now