We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

0/5 (Submit Your Rating)

Boston, MA


  • Over 8 years of overall IT experience that includes 3 plus years of Big Data with proven expertise in full application software life cycle development involving Analysis, Design, Development, Testing, Implementation of application software with emphasis on Object Oriented, J2EE and Client Server technologies.
  • Experienced in developing web applications in various domains like Insurance, Retail and Banking.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Name Node, Data Node, YARN, Map Reduce, Sentry, Spark, Falcon, Hbase, Hive, Pig, Sentry, Ranger.
  • Developed Scripts and automated data management from end to end and sync up between all the clusters.
  • Strong hands on experience in Hadoop Framework and its ecosystem including but not limited to HDFS Architecture, MapReduce Programming, Hive, Pig, Sqoop, HBase, Oozie etc.
  • Worked on disaster management with Hadoop cluster.
  • Involved in Building a Multi tenant cluster.
  • Experience in Mainframe data and batch migration to Hadoop.
  • Hands on experience in installing, configuring Cloudera's and Horton distribution.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Extensively used Apache Flume to collect logs and error messages across the cluster.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre - aggregations before storing the data onto HDFS.
  • Developed analytical components using Scala, Spark and Spark Stream.
  • Good working Experience on using Avro, Nested Avro, Sequence files, Parquet & ORC file formats.
  • Architected, Designed and maintained high performing ELT/ETL Processes.
  • Accomplished with creating Test Plans, defining Test Cases, reviewing and maintaining Test Scripts, interacting with team members in fixing errors and executing Integration testing (SIT), User Acceptance Testing (UAT), Stage (PFIX) Unit, System Integrated Test, Regression Test and Customer Test.
  • Extensive experience with core Java, MapReduce programming, advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Specialized in J2SE, J2EE, Servlets, JSP, JSTL, Custom Tags.
  • Extensive experience in Application Software Design, Object Oriented Design, Development, Documentation, Debugging, Testing and Implementation.
  • Expert level skills in designing and implementing web server solutions and deploying java application servers like Tomcat, JBoss, WebSphere, Weblog icon Windows platform.
  • Excellent work experience with XML/Database mapping, writing SQL queries, Store Procedures, Triggers with major Relational Databases Oracle 11g, 12c and SQL Server.
  • Extensively worked on Java/J2EE systems with different databases like Oracle, My SQL and DB2.
  • Writing the PL/SQL, SQL and Stored Procedures.
  • Experience using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Strong experience in web services that include several components like SOAP, WSDL, XSD, Axis2 and JAX-WS and Restful web services.
  • Experience in using HTML, DHTML, XHTML, JavaScript, AJAX, XML and CSS.
  • Experience in developing Client side Web applications using HTML, JSP, JQuery, JSTL, AJAX, and Custom Tags while implementing the client side validations using JavaScript and Server side validations using Struts Validations Framework.
  • Team Player with demonstrated ability to work fast and paced, challenging environment with excellent debugging and problem solving skills.
  • Strong interpersonal skills, analytical ability, communication skills, writing skills, highly organized to meet the deadlines and ability to grasp and adept to the rising technologies.


Hadoop Technologies: HBase, HIVE, Sqoop, Flume, HDFS, Oozie, Zoo Keeper, Spark, Falcon, Pig, Kafka, Sentry

J2EE Technologies: Servlets, JSP, EJB, JDBC, Web Services (WSDL, SOAP), Spring and

Web Services/ Application Servers: Apache tomcat Server, IBM WebSphere server, JBoss

Web Tools and Languages: HTML, XML, CSS, DHTML, Java Script

Databases: IBM DB2, Oracle8i/9i/10g, MS SQL Server 2005/2008, MySQL

Languages: Java / J2EE, HTML, SQL

OS: Windows 2003/2008/XP/Vista, Unix, Linux (Various Versions)

Tools: MS-Office 2003/2007/2010 , Eclipse3.3/3.4, Eclipse, Net Beans

Version Control: IBM RTC

Others: ASP.NET, VB.NET and C#

IDEs: Eclipse, NetBeans, JDeveloper, MyEclipse


Confidential, Boston, MA

Hadoop Engineer


  • Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
  • Performed both major and minor upgrades to the existing Cloudera Hadoop cluster.
  • Implemented High Availability of Name Node, Resource manager on the Hadoop Cluster.
  • Administered Cassandra cluster using Datastax OpsCenter and monitored CPU usage, memory usage and health of nodes in the cluster.
  • Integrated Hadoop with Active Directory and enabled Kerberos for Authentication.
  • Created Hive external tables and designed data models in hive.
  • Involved in the process of designing Cassandra Architecture including data modeling.
  • Implemented YARN Resource pools to share resources of cluster for YARN jobs submitted by users.
  • Performed storage capacity management, performance tuning and benchmarking of clusters.
  • Performance tuning of HIVE service for better Query performance on ad-hoc queries.
  • Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Data ingestion is done using Flume with source as Kafka Source & sink as HDFS.
  • For one of the use case, used Spark Streaming with Kafka & HDFS/HBase to build a continuous ETL pipeline. This is used for real time analytics performed on the data.
  • Performed import and export of large data set transfer between traditional databases and HDFS using Sqoop.
  • Proactively coordinated major issues with vendor (Cloudera) for faster resolution.
  • Proactively monitored systems and services, manage backup and disaster recovery systems and procedure.
  • Worked on disaster management with Hadoop cluster.
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
  • Designed and presented a POC on introducing Impala in project architecture.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS.
  • Responsible for handling Hive queries using Spark SQL that integrates with Spark environment.
  • Used Falcon for data replication.
  • Build Multi tenant cluster for different line of business.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Implemented Hive Generic UDFs to implement business logic.
  • Managed cluster coordination services through Apache ZooKeeper.
  • Monitor cluster stability, use tools to gather statistics and improve performance.
  • Keep current with latest technologies to help automate tasks and implement tools and processes to manage the environment.

Environment: HDFS, PIG, Hive, Sqoop, Oozie, HBase, Zoo keeper, Cloudera Manager, java, Ambari, Oracle, MYSQL, Cassandra, Sentry, Falcon, Spark.

Confidential, Boston, MA

Hadoop Consultant


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive, Spark, Map Reduce.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Worked on implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
  • Used Spark Streaming with Kafka and HDFS and Hbase to build a continuous ETL pipeline.
  • Used Scala to write code for Spark streaming use case.
  • Handling in Importing and exporting data into HDFS and Hive using SQOOP and Kafka.
  • Involved in creating Hive tables, loading the data and writing hive queries.
  • Worked on Designing and Developing ETL Workflows using Java for processing data inHDFS/Hbase using Oozie.
  • Worked on importing the unstructured data into the HDFS using Flume.
  • Wrote complex Hive queries and UDFs.
  • Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Worked with NoSQL databases like Hbase, Cassandra in creating tables to load large sets of semi structured data.
  • Handled 20 TB of data volume with 10 Node cluster in Production environment.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Spark Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra.

Confidential, Newark, NJ

Java/ Hadoop Consultant


  • Evaluated suitability of Hadoop and its ecosystem by implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big DataHadoop Initiative.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Designed and implemented Map Reduce-based large-scale parallel relation-learning system.
  • Installed and configured Pig for ETL jobs. Written Pig scripts with regular expression for data cleaning.
  • Creating Hive external tables to store the Pig script output. Working on them for data analysis in order to meet the business requirements.
  • Developed Map Reduce pipeline jobs to process the data and create necessary Files.
  • Involved in loading the created Files into HBase for faster access of all the products in all the stores without taking Performance hit.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
  • Imported data using Sqoop to load data from MySQL and Oracle to HDFS on regular basis.
  • Created HBase tables to store various data formats of PII data coming from different portfolios.
  • Developed Hive queries and UDFS to analyze/transform the data in HDFS..
  • Moving data from Oracle to HDFS and vice-versa using SQOOP.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Worked with different file formats and compression techniques to determine standards.
  • Worked on installing cluster, commissioning & decommissioning of data node, NameNode recovery, capacity planning, and slots configuration.
  • Involved in loading data from Linux file system to HDFS

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Hadoop Map Reduce, HBase, Shell Scripting, Eclipse, oozie, Navigator.

Confidential, Jersey City, NJ

Java/ J2EE Developer


  • The work involved design, implementation and coding in Perl, XML, Java, Java Servlets, J2EE, EJB, and JSP etc.
  • Architect the workflow of the whole project using various design patterns like MVC. J2EE Patterns were implemented in each tier.
  • The system was designed according to J2EE specifications. Servlets were used as a Front Controller gateway into the system. Helper classes were used to limit the business logic in the servlet. EJB’s were used to talk to the database and JSP along with HTML, XML were used to control the client view.
  • Used Rational Rose development tool to design various Use Cases, Collaboration and Sequence diagrams in Unified Modeling Language UML.
  • Recommended the System Configurations in terms of Hardware and Software for the Team Site Server.
  • Designed and Developed Replication Strategy for Confidential . Confidential Business procedures do not allow implementation of Microsoft Clustering for Replication.
  • Each and every instance of Team Site was in a different domain leading to domain related issues like, SID. Code was designed and developed so that it had domain independent features.
  • Gathered requirements and then developed complex workflows which involved Templates. Open Deploy.
  • Part of the team involved in the design and coding of the Data capture templates, presentation & component templates.
  • Developed perl modules for workflows, inline commands and callouts for DCT
  • Developed and configured templates to capture and generate multi-lingual content. With this approach US Chinese branch content is encoded in BIG5.
  • Organized meetings, did presentations for various design components, gathered requirements and part of knowledge transfer training.
  • Designed the Integration of Team Site with Websphere.

Environment: Java/J2EE, JSP, EJB, Websphere, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, log4j, My Eclipse, Object Oriented Perl, Team site, WebLogic, SQL & Oracle


Java Developer


  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Requirement gatherings from various parties involved in the project
  • Estimate timelines for development tasks.
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Interact with Client to get the confirmation on the functionalities.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Provide Support and update for the period under warranty.
  • Monitoring of test cases to verify actual results against expected results.
  • Performed Functional, User Interface test and Regression Test
  • Carrying out Regression testing to track the problem tracking.

Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS.

We'd love your feedback!