We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Engineer Resume

Sr Big Data Hadoop Engineer Woodbridge, NJ


  • Around 6+ years of strong experience in software development using Big Data, Hadoop, Apache Spark Java/J2EE, Scala, Python technologies.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
  • Involved in the Software Development Life Cycle (SDLC)phaseswhich include Analysis, Design, Implementation, Testing and Maintenance.
  • Strong technical, administration, and mentoring knowledge in Linux and Big Data/Hadoop technologies.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
  • Installing, configuring and managing of Hadoop Clusters and Data Science tools.
  • Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue.
  • Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
  • Experience in developing Shell scripts and Python Scripts for system management.
  • Experience in profiling huge sets of data using Informatica BDM 10
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Experience with Object Oriented Analysis and Design (OOAD)methodologies.
  • Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
  • Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
  • Extensively worked on implementing SOA (Service Oriented Architecture) using XMLWeb services (SOAP, WSDL, UDDI and XML Parsers).
  • Worked with XML parsers like JAXP (SAX and DOM) and JAXB.
  • Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
  • Proficient with Core Java,AWT and also with the markup languages likeHTML 5.0,XHTML,
  • DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
  • Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.


Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB.

Big data distribution: Cloudera, Amazon EMR

Programming languages: Core Java, Scala, Python, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTful and SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Cherrypy,Apache Tomcat, WebSphere

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: Git, SVN and CVS

Analytics: Tableau, SPSS, SAS EM and SAS JMP


Confidential, Woodbridge NJ

Sr. Big data/Hadoop Engineer


  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in defining job flows
  • Involved in managing and reviewing Hadoop log files
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Supported Map Reduce Programs those are running on the cluster
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible to manage data coming from different sources.
  • Installed and configured Hive and developed Hive UDFs to extend core functionality of hive
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Confidential, Fort Washington, PA

Sr. Bigdata/Hadoop Engineer


  • Worked on importing data from various sources and performed transformations using MapReduce, hive to load data into HDFS.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Created scripts to automate the process of Data Ingestion.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce
  • Worked on the conversion of existing MapReduce batch applications for better performance.
  • Created HBase tables to store variable data formats coming from different portfolios
  • Performed real time analytics on HBase using Java API and Rest API
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables
  • Worked on compression mechanisms to optimize MapReduce Jobs
  • Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume
  • Experienced with working on Avro Data files using Avro Serialization system
  • Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team

Environment: Hive, HDFS, MapReduce, Flume, Pig, Spark Core, Spark -SQL, Oozie, Oracle, Yarn, Netezza,GitHub, Junit, Linux, HBase, Cloudera, sqoop, HDFS, Java, Scala, Maven and Splunk, Eclipse.

Environment: Apache Hadoop, HDFS, Map Reduce, Pig, Hive tables, Hive UDFs, Linux, MySQL, HBase, UNIX, Java, ETL, Eclipse.


Hadoop Developer


  • Develop JAVA MapReduce Jobs for the aggregation and interest matrix calculation for users.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
  • Experienced in managing and reviewing application log files.
  • Ingest the application logs into HDFS and processes the logs using map reduce jobs.
  • Create and maintain Hive warehouse for Hive analysis.
  • Generate test cases for the new MR jobs.
  • Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
  • Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
  • Developed dynamic partitioned Hive tables to store data by date and workflow id partition.
  • Use Apache Scoop to dump the user incremental data into the HDFS on a daily basis.
  • Run clustering and user recommendation agents on the weblogs and profiles of the users to generate the interest matrix.
  • Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud.
  • Installed and configured Hive and also written Hive UDFs in java and python
  • Prepare the data for consumption by formatting it for upload to the UDB system.
  • Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
  • Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Horton works, Oozie, HBase


Java Developer


  • Communicate with Clients for Requirements Gathering, Explaining the requirements to Team Members
  • Analyzing the Requirements and Designing Screen Proto types.
  • Involved in Project Documentation.
  • Involved in creation of Basic DB Architecture for the application.
  • Involved in adding solution to VSS.
  • Designing & Development of Screens.
  • Coded JS functions for client validations.
  • Created user Controls for reusability.
  • Creation of Tables, Views, Packages, Sequences, Functions for all the modules of the project.
  • Developed Crystal Reports.
  • Integrating the functionality of all modules.
  • Involved in deploying the application.
  • Unit testing & integration testing.
  • Designing test plan, test cases and checking the validation.
  • Test whether the application meets the business requirements.
  • Implementation ofthe system at client Location.
  • Giving Training to Application users, interacting with the client, understanding the change requests if any from client.
  • Responsible for Immediate Error Resolving.

Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP

Hire Now