We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Plano, TX


  • Having 9+ years of overall IT experience working as a Hadoop Developer in dealing with Apache Hadoop components like HDFS, Map Reduce, Hive QL, HBase, Pig, Hive, Sqoop, and Oozie, Spark and Scala and also as a Java Developer with Java and Object - Oriented Methodologies for wide range of development from Enterprise applications to web-based applications.
  • Experienced in getting streaming data into HDFs using Flume, memory channels, custom interceptors.
  • Extensively worked in writing, fine tuning and profiling Mapreduce jobs for optimized performance.
  • Extensive experience in implementing data analytical algorithms using Map reduce design patterns.
  • Experience in implementing complex map reduce algorithms to perform joins on the Map side using distributed cache.
  • Experience in writing Test cases, test classes using MRUnit, Junit and Mockito.
  • Extended Hive and Pig core functionality by writing Custom UDFs.
  • Experienced in handling ETL transformations using Pig Latin scripts, expressions, join operations and Custom UDF's for evaluation, filtering and storing data
  • Expert in analyzing real time queries using different NoSQL databases including Cassandra and HBase.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experience in converting business process into RDD transformations using Apache Spark and Scala.
  • Experience in Writing Producers/consumers and creating messaging centric applications using Apache Kafka.
  • Haveknowledge on Apache Storm to integrate with Apache Kafka for stream processing.
  • Experience in integrating Spark with Solr and Indexing with Apache Solr.
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS).
  • Knowledge on Splunk UI to work at production support to perform log analysis.
  • Experience in upgrading Hadoop cluster to major versions.
  • Experience in using Zookeeper for coordinating the distributed applications.
  • Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
  • Experience in developing Map-Reduce programs and custom UDF’s for data processing using Python.
  • Experience in developing SCALA scripts to run in SPARK cluster.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and troubleshooting Hadoop related issues.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.
  • Importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Expertise in various JAVA/J2EE technologies such as JSP 2.0, Servlets 2.x, Struts 1.2/2.0, Hibernate 2.0/3.0 ORM, Spring 2.0/3.0, JDBC.
  • Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
  • Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
  • Good working knowledge on Spring Framework.
  • Strong Experience in writing SQL queries.
  • Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
  • Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).


Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and Hbase, Spark

Programming Languages: Java (5, 6, 7), Python,Scala

Databases/RDBMS: MySQL, SQL/PLSQL, MS: SQL Server 2005, Oracle 9i/10g/11g

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell


Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

Office Tools: MSOffice, MS: Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon EC2


Confidential, Plano, TX

Sr. Hadoop Developer


  • Prepared pig scripts and Spark SQL/Spark streaming to handle all the transformations specified in the S2TM's and to handle SCD2 and SCD1 scenarios.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
  • Worked on Apache spark writing Python applications to convert txt, xls files and parse.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Loading data intoSpark RDD and do in memory data Computation to generate the Output response.
  • Loading the data to HBASE by using bulk load and HBASE API.
  • Used Scala to write several Spark Jobs in real time applications.
  • Developed Spark code using Python for faster processing of data on Hive.
  • Developed MapReduce jobs in Python for data cleaning and data processing.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Worked extensively on spark and MLlib to develop a regression modelfor logistic information.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Automated hourly and daily transaction reports using talend open studio.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Used the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in making code changes for a module in work station simulation for processing across the cluster using spark-submit.

Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Spark, Scala, Cloudera CDH4, Oracle, Kibana, SFTP.

Confidential, Irving, TX

Spark Scala Developer


  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Worked on Spring RESTful for dependency injection.
  • Developed and retrieved No-SQL data using Mongo DB using DAO's.
  • Implemented test scripts to support test driven development and continuous integration.
  • Perform data analytics and load data to Amazon s3/datalake/Spark cluster.
  • Write and build Azkaban workflow jobs to automate the process.
  • Develop spark Sql tables & queries to perform Adhoc data analytics for analyst team.
  • Deploy components using Maven Build system and Docker images
  • Involved in deploying multi module Azkaban applications using Maven
  • Played an important in migrating jobs from spark 0.9 to 1.4 to 1.6.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Involved in migration from Livelink to Sharepoint using Scala through Restful web service.
  • Extensively involved in developing Restful API using JSON library of Play framework.
  • Used Scala collection framework to store and process the complex consumer information.
  • Used Scala functional programming concepts to develop business logic.
  • Designed and implemented Apache Spark Application (Cloudera).

Environment: Hadoop, Spark, Scala, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Hbase.

Confidential, Costa Mesa, CA

Hadoop Developer


  • Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted with performance tuning and monitoring.
  • Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinate with Java team in creating MapReduce programs.
  • Worked on creating Pig scripts for most modules to give comparison effort estimation on code development.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Collaborated with BI teams to ensure data quality and availability with live visualization
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries.
  • Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
  • Involved in running Hadoop Jobs for processing millions of records of text data.
  • Developed the application by using the Struts framework.
  • Created connection through JDBC and used JDBC statements to call stored procedures.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
  • Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
  • Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
  • Writing the script files for processing data and loading to HDFS.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, Sqoop.

Confidential, Herndon VA

Hadoop Developer


  • Processed data into HDFS by developing solutions.
  • Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
  • Created HBase tables to load large sets of structured data.
  • Managed and reviewed Hadoop log files.
  • Involved in providing inputs for estimate preparation for the new proposal.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
  • Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
  • Used Sqoopwidely in order to import data from various systems/sources (like MySQL) into HDFS.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
  • Used different file formats like Text files, Sequence Files, Avro.
  • Cluster co-ordination services through Zookeeper.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
  • Trouble shooting.
  • Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.

Confidential, Norristown, PA

Java Developer


  • Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, Extensively involved throughout Software Development Life Cycle (SDLC
  • Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
  • Used SOAP/ REST for the data exchange between the backend and user interface.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes.
  • Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
  • Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
  • Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
  • Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
  • Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
  • Developed authentication through LDAP by JNDI.
  • Developed and debugged the application using Eclipse IDE.
  • Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
  • Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix.
  • Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
  • Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.

Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.


Java Developer


  • Involved in analysis and design phase of Software Development Life cycle (SDLC).
  • Used JMS to pass messages as payload to track statuses, milestones and states in the workflows.
  • Involved in reading & generating pdf documents using ITEXT. And also merge the pdfs dynamically.
  • Involved in the software development life cycle coding, testing, and implementation.
  • Worked in the health-care domain.
  • Involved in Using Java Message Service (JMS) for loosely coupled, reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
  • Developed MDBs using JMS to exchange messages between different applications using MQ Series.
  • Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
  • Involved in Content Management using XML.
  • Developed a standalone module transforming XML 837 module to database using SAX parser.
  • Installed, Configured and administered WebSphere ESB v6.x
  • Worked on Performance tuning of WebSphere ESB in different environments on different platforms.
  • Configured and Implemented web services specifications in collaboration with offshore team.
  • Involved in Creating dash board charts (business charts) using fusion charts.
  • Involved in creating reports for the most of the business criteria.
  • Involved in the configurations set for Web logic servers, DSs, JMS queues and the deployment.
  • Involved in creating queues, MDB, Worker to accommodate the messaging to track the workflows
  • Created Hibernate mapping files, sessions, transactions, Query and Criteria’s to fetch the data from DB.
  • Enhanced the design of an application by utilizing SOA.
  • Generating Unit Test cases with the help of internal tools.
  • Used JNDI for connection pooling.
  • Developed ANT scripts to build and deploy projects onto the application server.
  • Involved in implementation of continuous build tool as Cruise control using Ant
  • Used Star Team as version controller.

Environment: JAVA/J2EE, HTML, JS, AJAX, Servlets, JSP, XML, XSLT, XPATH, XQuery, WSDL, SOAP, REST, JAX-RS, JERSEY, JAX-WS, Web Logic server 10.3.3, JMS, ITEXT, Eclipse, JUNIT, Star Team, JNDI, Spring framework - DI, AOP, Batch, Hibernate.


Jr. Java Developer


  • Involved in the requirement analysis, design, and development of the new NCP project.
  • Involved in the design and estimation of the various templates, components which were developed using Day CMS (Communique).
  • The CMS and Server side interaction was developed using Web services and exposed to the CMS using JSON and JQuery.
  • Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
  • Worked on Java Mail API. Involved in the development of Utility class to consume messages from the message queue and send the emails to customers.
  • Normalized Oracle database, conforming to design concepts and best practices.
  • Used JUnit framework for unit testing and Log4j to capture runtime exception logs.
  • Performed Dependency Injection using spring framework and integrated with Hibernate and Struts frameworks.
  • Hands on experience creating shell and perl scripts for project maintenance and software migration. Custom tags were developed to simplify JSP applications.
  • Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
  • Used Validator framework of the Struts for client side and server side validation.
  • The UI was designed using JSP, Velocity template, JavaScript, CSS, JQuery and JSON.
  • Enhanced the FAS system using struts MVC and iBatis.
  • Involved in developing web services using Apache XFire& integrated with action mappings.
  • Developed Velocity templates for the various user interactive forms that triggers email to alias. Such forms largely reduced the amount of manual work involved and were highly appreciated.
  • Used Internalization, Localizations, tiles and tag libraries to accommodate for different locations.
  • Used JAXP for parsing & JAXB for binding.
  • Co-ordinate Application testing with the help of testing team.
  • Involved in writing services to write core logic for business processes.
  • Involved in writing database queries, stored procedures, functions etc
  • Deployed EJB Components on Web Logic, Used JDBC API for interaction with Oracle DB.
  • Involved in Transformations using XSLT to prepare HTML pages from xml files.
  • Enhanced Ant Scripts to build and deploy applications
  • Involved in Unit Testing, code review for the various enhancements
  • Followed coding guide lines while developing workflows.
  • Effectively managed the quality deliverables to meet deadlines.
  • Involved in end to end implementation of the application.

Environment: Java 1.4, J2EE (EJB, JSP/Servlets, JDBC, XML), Day CMS, XML, My Eclipse, Tomcat, Resin, Struts, iBatis, Web logic App server, DTD, XSD, XSLT, Ant, SVN

Hire Now