We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00 Rating

Charleston, Sc


  • An extensive experience of 9 years in all phases of Software Development Life Cycle on Java/J2EE, Big Data and cloud based applications spanning across technologies and business domains.
  • Over 4+ years of design and development experience in Big Data Hadoop technologies like building, loading and analysing data on AWS - EMR, Spark, HDFS & YARN Cluster, Redshift and knowledge on Azure cluster setup.
  • Experience working with Cassandra, HBase, MongoDB NoSQL database concepts of Spark Streaming, SparkSQL, Flume, Scala, MapReduce, Hive, Impala, Pig, Sqoop, Apache Drill and Oozie.
  • Experience in Data Analytics by writing real-time processing data using Spark Streaming with Kafka, Spark SQL and Flume written in Scala and Python.
  • Extensive experience working with different file formats, compression techniques, setting up batch jobs, ETL/ELT pipelines, automation using Chef on AWS.
  • Having solid experience in creating ETL data pipelines by using MapReduce, Hive/Impala, Pig, Sqoop and UDF's (Hive, Pig) in Java and Python.
  • Extensive experience in Java/J2EE applications including AngularJS, HTML, JavaScript, XML, JSON, CSS, SQL/HQL queries and data analysis.
  • Worked extensively on Web services and the Service-Oriented Architecture (SOA), Web Services Description Language (WSDL), Simple Object Access Protocol (SOAP), and UDDI.
  • Oracle expertise includes good understanding of in depth database architecture and ERD's Oracle programming evaluation of application products, database schema design and development.
  • Expertise in writing SQL, PL/SQL to integrate of complex OLTP and OLAP database models and data marts, worked extensively on Oracle, SQL SERVER, and DB2.
  • Experience in ETL tools such as Informatica Power Center, Oracle PL/SQL, and Talend.
  • Experience in developing applications using waterfall and Agile (XP and Scrum), Test First, Test Driven methodologies and good understanding of Service oriented architecture
  • Experience in developing the Shell, Perl and Python Scripts (Linux & Unix) to execute jobs automated to process extraction, data integration and report generation.
  • Experience in working on version control systems like CVS, SVN, Git and using build tools for automated continuous integration such as ANT, Jenkins and Maven.
  • Hands on experience in working with Business Intelligence for data visualization and generated BI Analytic reports with specialization on Tableau and Qlikview.
  • Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
  • Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.


Hadoop Ecosystem/ Big Data: HDFS, MapReduce, Mahout, HBase, Pig, Hive, Sqoop, Flume, Power pivot, Puppet, oozie, Zookeeper, Apache spark, Splunk, YARN, Falcon, Avro, Impala

Frameworks in Hadoop: Spark, Kafka, Storm, Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0Databases, Application Servers & NoSQL

Databases: Oracle, PL/SQL MySQL, DB2, Database Technologies MySQL, Oracle 8i, 9i, 11i & 10g, MS Access, Microsoft SQL-Server 2000 and DB2 8.x/9.x, PostgreSQL, Teradata, Cassandra, MongoDB, HBase

JAVA & J2EE Technologies: Core Java, Hibernate, Spring framework, JSP, Servlets, Java Beans, JDBC, EJB 3.0, Java Sockets & Java Scripts. jQuery, JSF, Prime Faces, Servlets, SOAP, XSLT and DHTML Messaging Services JMS, MQ Series, MDB, J2EE MVC, Struts 2.1, Spring 3.2, MVC, Spring Web, JUnit, MR-Unit

Amazon Web Services (AWS): Elastic Map Reduce(EMR), Amazon EC2, Amazon S3, AWS Code Commit, AWSCodeDeploy, AWS CodePipeline, Amazon CloudFront, AWS Import/Export.

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, Perl, Shell script, Seed, Ask, Java Script, XML, HTML, XHTML, JNDI, Python, Scala, HTML5, AJAX, jQuery, CSS, JavaScript, AngularJS, VB Script, WSDL, ODBC Architectures REST

Source Code Control: Github, CVS, SVN, Clearcase

IDE & Build Tools: Eclipse, Net Beans, Spring Tool Suite, Hue (Cloudera specific), ToadMaven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer, Maven, Ant, grade

Web/Application servers: Apache Tomcat, WebLogic, JBoss, Web Logic, IBM Web Sphere

Analysis/Reporting: Ganglia, Nagios, Custom Shell scripts, Qlikview, Tableau, BOXI, ETL(Informatica)


Confidential, Charleston, SC.

Hadoop Developer


  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters.
  • Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE(EMR).
  • Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
  • Used Spark Streaming on Scala to construct learner data model from sensor data using MLLib.
  • Developing the Tasks and setting up the requirement environment through AWS for running Hadoop in cloud on various instances.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Implemented Parquet Columnar storage systems which serialize and store data by column, so that searches across large data sets and reads of large sets of data are highly optimized.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards usingSPARK STREAMING. workedon SPARK engine creating batch jobs with incremental loadthrough KAFKA, FLUME, HDFS/S3, KINESIS, Sockets etc.
  • Productive implementation of DStreams on resilient distributed dataset(RDD) through various windows also simultaneously update log files for the streams.
  • Extensive experience in Spark Streaming (version 2.0.0)through core Spark API running Scala, Java & Python Scripts to transform raw datafrom several data sources into forming baseline data.
  • Implemented various MapReduce Jobs in custom environments and updating them Hive tables by generating hive queries.
  • Good hands on experience in writing HQL statements as per the user requirements.
  • Implemented Cassandra connection with the Resilient Distributed Datasets (local and cloud)
  • Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Elastic MapReduce jobs.
  • DevelopedUDFsin Scala, Java & Python as and when necessary to use in PIG and HIVE queries
  • Experience in usingSequence files, RCFile, AVROfile formats for developing UDFs.
  • DevelopedOozieworkflow for scheduling and orchestrating the ETL process
  • Implemented authentication usingKerberosand authentication usingApache Sentry.

Environment: Cloudera Hadoop CDH5, Cloudera Manager, Map reduce, Flume, Pig, Spark, Hive, Impala, AWS, Oozie, Kafka, Cassandra, MongoDB

Confidential, Houston, TX

Hadoop Developer


  • Closely worked with business analysts and senior level Architects to design and configure Master/slave architecture in testing, development and production using configuration tool puppet and Hortonworks distribution.
  • Data ingestion from large data sets from DB2 and RDBMS into Hadoop edge node using Map Reduce and Sqoop (vice-versa) and FTP shell Scripts.
  • Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
  • Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
  • Creating impala views on top of Hive tables for faster access to analyze data
  • Developed Map Reduce Programs in Java for filtering out the unstructured data and also written custom input and output format classes onto HDFS storage layer.
  • Extensively involved in loading, filtering, transforming and combined data using customs UDFs and Generic UDF's, Pig loader and Storage classes from Piggybank by using Pig.
  • Involved in resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way.
  • Involved in Unit testing by analyzing Informatica mappings that compares with Pig Scripts that performs same set of transformed data as per business user requirements.
  • Developed scripts using MAVEN build tools in Jenkins to move source code from one environment to another environment and responsible for managing code in SVN distributed version control and the access control strategies.
  • Involved in regular commissioning and decommissioning of nodes in order to balance Hadoop cluster and also archiving of name node.
  • Involved in creating user interactive sheets and reports as per business requirements and written SQL Scripts to load the data in Qlikview application.

Environment: Hortonworks, HDFS, Core Java, MapReduce, Hive, Informatica Power Centre, HQL, Pig, Flume, Perl scripting, Python, UNIX, Oozie, Shell Scripting, Maven, Teradata, Qlikview.

Confidential, Norman, OK

Hadoop Developer


  • Setting the cluster, configuration and maintenance, install components of the Hadoop ecosystem.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Stored data from HDFS to respective Hive tables for business analysts to conduct further analysis in identifying data trends.
  • Developed Hive ad-hoc queries and filtered data in order to increase the effectiveness of the process execution by using functions like Joins, Group By, and Having.
  • Increased the time efficiency of the Hive QL using partitioning of data and reduced the time difference of executing the sets of data by applying the compression techniques like SNAPPY for Map-Reduce Jobs.
  • Created Hive Partitions for storing data for different trends under different partitions.
  • Connected the Hive tables to data analysis tools like Tableau for graphical representation of the trends.
  • Assisted project manager in problem shooting relevant to Hadoop technologies for data integration between different platforms like Sqoop-Sqoop, Hive-Sqoop, and Sqoop-Hive.

Environment: Hortonworks, Java 7, HBase, HDFS, MapReduce, Hadoop 2.0, Hive, Pig, Eclipse, Linux, Sqoop, MySQL, Agile, Kafka, Cognos

Confidential, Durham, NC

Hadoop Developer


  • Implemented on Hadoop scaling from 6 nodes in POC environment to 10 nodes in development and ended up with 40 nodes of clusters in pilot environment (prod).
  • Included in complete Implementation lifecycle, spent significant time in composing customized MapReduce, Pig and Hive programs.
  • Solid involvement with Big data processing using Hadoop technologies HDFS, MapReduce, Crunch, Hive and Pig.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way. Broadly used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Expert in developing customized user define functions (UDF's) in java to extend Hive and Pig Latin functionality.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Applied Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Experience on tuning the performance of Pig queries.
  • Involved in developing Pig Scripts for change data capture (CDC) and delta record processing between newly arrived data and already existing data in HDFS.
  • Scheduling Jobs Managing to remove the duplicate log data files in HDFS using Oozie.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Created Mappings using Talend Open Studio for Evaluation and POC.
  • Designed Risk Audit process for a Healthcare client and created Risk assessment database on hive for performing Risk Assessment and Audits and used Tableau for visualization.

Environment: Hadoop 2.x, HDFS, Map Reduce, Flume, Hive 0.10, Pig 0.11, Sqoop, HBase, YARN, Shell Scripting, Maven, Git Hub, Ganglia, Apache Solr, AWS, Talend Open studio for Big data, Java and Cloudera.

Confidential, Philadelphia, PA

Java/Hadoop Developer


  • Design and creation of GUI screens using JSP, Servlets and HTML based on Struts MVC Framework.
  • Operated JDBC to access Database.
  • Manipulated JavaScript for client side validation.
  • Validations were performed using Struts Validation Framework.
  • Commit and Rollback methods were provided for transactions processing.
  • Designed and developed the action form beans and action classes and implemented MVC using Struts framework.
  • Written Oracle SQL Stored procedures, functions and triggers.
  • Developed both Session and Entity beans representing different types of business logic abstractions.
  • Maintained the server log document.
  • Performed Unit /Integration testing for the test cases.
  • Implemented and designed user interface for web based customer application.
  • Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
  • Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Also have hand on Experience on Pig and Hive User Define Functions (UDF).
  • Execution of Hadoop ecosystem and Applications through Apache HUE.
  • Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.

Environment: Java, JSP, HTML, CSS, Java Script, JQuery, Struts 2.0, MySQL, Oracle, Hibernate, JDBC, Eclipse, SQL Stored Procedures, Tomcat, Hive, Pig, Sqoop, Flume and Cloudera.

Confidential, Bothell, WA

Associate Java Developer


  • Developed web pages using Struts framework, JSP, XML, JavaScript, Html/ DHTML and CSS, configure struts application, use tag library.
  • Embedded a custom-built Java application in Sales Cloud using JSON Web Token (JWT) as the security mechanism.
  • Developed Application using Spring and Hibernate, Spring batch, Web Services like Soap and restful Web services.
  • Used Spring Framework at Business Tier and also Spring's Bean Factory for initializing services.
  • Used AJAX, JavaScript to create interactive user interface.
  • Implemented client side validations using JavaScript & server side validations.
  • Developed Single Page application using angular JS & backbone JS.
  • Developed app using Front Controller, Business delegate, DAO and Session Facade Patterns.
  • Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
  • Used Hibernate annotations and created Hibernate POJOs.
  • Developed Web Services to communicate to other modules using XML based SOAP and WSDL.
  • Designed and implemented (SOA, SOAP) next generation system on distributed platform.
  • Designed and developed most of the application's GUI screens using GWT framework.
  • Used JAXP for Xml parsing & JAXB for marshalling & un marshalling.
  • Used SOAP-UI to test the Web Services using WSDL.
  • Involved in doing analysis on DB Schema as per new design in DB2 from Oracle.
  • DOJO toolkit Used for UI development and sending asynchronous AJAX requests to the server.

Environment: Java/J2EE, JSP, Servlets, EJB, XML, XSLT, Struts, Rational Rose, Apache Struts Framework, Web Services, DB2, Beyond Compare, Web Services, CVS, JUnit, Log4j, Windows XP, Red Hat LINUX.


Associate Java Developer


  • Responsible for the Requirement Analysis and Design of Smart Systems Pro (SSP)
  • Involved in Object Oriented Design (OOD) and Analysis (OOA).
  • Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
  • Worked with Restful Web Services and WSDL.
  • Worked with Maven build tool to build the Project.
  • Involved in Coding JavaScript code for UI validation and worked on Struts validation frameworks.
  • Analyzing the Client Requirements and designing the specification document based on the requirements.
  • Worked on implementing directives and scope values using AngularJs for an existing webpage.
  • Familiar with the state-of-the-art standards, processes, design processes used in creating and designing optimal UI using Web 2.0 technologies like Ajax, JavaScript, CSS, and XSLT.
  • Involved in the Preparation of Program Specification and Unit Test Case Document.
  • Designed the Proto according to the Business requirements.
  • Developed the web tier using JSP, Struts MVC to show account details and summary.
  • Used Struts Tiles Framework in the presentation tier.
  • Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript.
  • Used AJAX for asynchronous communication with server
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.

Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JSON, CSS, JavaScript, Spring, Struts, Hibernate, Eclipse, Apache Tomcat, and Oracle.

We'd love your feedback!