We provide IT Staff Augmentation Services!

Hadoop Developer/spark Resume

Beaverton, Or


  • Offering 7+ years of overall IT experience in Application development in Java and Big Data Hadoop.
  • Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive, Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
  • Good working experience on Apache Hadoop Map Reduce programming, PIG Scripting and HDFS.
  • Knowledge of NO SQL databases like Mongo DB and Cassandra.
  • Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
  • Involved in writing Pig scripts to reduce the job execution time.
  • Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables.
  • Good hands-on experience in Apache Spark with Scala.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Developed Spark SQL programs for handling different data sets for better performance.
  • Hands on experience in Cloudera and Hortonworks Hadoop environments.
  • Good understanding of Hadoop administration with Hortonworks.
  • Good Knowledge on real time data feeding platform-KAFKA, integration software like Talend and NOSQL databases like MongoDB, HBase and Cassandra.
  • Experience working with interactive applications like TEZ.
  • Configured TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experienced in loading data to Hive partitions and bucketing.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera distributions.
  • Worked with ETL/ELT tools (e.g. Talend)
  • Have Good Knowledge on Talend for Integration and Hadoop.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Expertise on Scala Programming language and Spark Core.
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • Experience in using Maven 2.0 to compile, package and deploy to the application servers.
  • Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
  • Extensive expertise in creating and Automation of workflows using Oozie workflow Engine.
  • Scheduled jobs using Oozie Coordinator, to execute jobs on specific days (excluding weekends).
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies.
  • Extensive experience working in Oracle, SQL Server and MySQL database. Hands on experience in application development using Java and RDBMS.
  • Experience in UNIX Shell scripting.
  • Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server.
  • Hands on experience in developing the applications with Java, J2EE, JSP, EJB, SOAP, JDBC2, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g.
  • Strong knowledge of version control systems like SVN & GIT.


Hadoop: HDFS, Map Reduce, YARN, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, and ZooKeeper.

Languages: Java, Scala, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts.

Database: Oracle 10g, MySQL.

No SQL Database: HBase, Cassandra, MongoDB.

Web Technologies: HTML, XML, CSS, XSLT, XHTML.

Web Servers: Apache Tomcat, JBoss.

J2EE Technologies: JDBC, Amazon Cloud (S3, EC2).

Frameworks: Spring, MVC, Struts.

Tools: & IDEs: Eclipse, NetBeans, Maven, Toad, DB Visualizer.

Operating Systems: Windows, Linux (Cent OS, Ubuntu).


Hadoop Developer/SPARK

Confidential - Beaverton, OR


  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
  • Working on large-scale Hadoop YARN cluster for distributed data Storage, processing and analysis.
  • Worked totally in agile methodology and also developed Spark scripts by using Scala shell.
  • Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
  • Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context. Spark- SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used IMPALA for querying the HDFS data.
  • Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
  • Services like EC2 and S3 for small data sets.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Used Apache kafka to get the data from kafka producer which in turn pushes data to broker.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Written robust/reusable HiveQL Scripts and UDF's in Hive using Java.
  • Experience with Test Driven Development (TDD) and acceptance- test using Behave.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Designed and built unit tests and executed operational queries on HBase.
  • Built Apache Avro schemas for publishing messages to topics and enabled relevant serializing formats for message publishing and consumption.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop.
  • Worked on migrating MapReduce Python programs into Spark transformations using Spark.
  • Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark.
  • Implemented authentication and authorization service using Kerberos authentication Protocol.
  • Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
  • Implemented a script to transmit information from Webservers to Hadoop using Flume.
  • Used Zookeeper to manage coordination among the clusters.
  • Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
  • Developed Scala program for data extraction using Spark Streaming.
  • Setting up and managing Kafka for Stream processing.
  • Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Created Produce, consumer and Zookeeper setup to Kafka replication.
  • Experienced with batch processing of data source using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, MapReduce, YARN, Agile methodologies, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Pyhton, AWS, Hbase, Kafka, AVRO, Oracle, Unix.

Big Data Analyst

Confidential - Dallas, TX


  • Responsible to manage data coming from different sources, loading of structured and unstructured data and involved in HDFS maintenance.
  • Write Unix shell scripts in combination with the Talend data maps to process the source files and load into database.
  • Worked in Agile methodology for Development.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Implemented Hadoop YARN jobs to write data into Avro format.
  • Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
  • Developed and executed hive queries for denormalizing the data.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Worked Big data processing of clinical and non-clinical data using Map Reduce.
  • Performed data validation on the data ingested using Hadoop YARN by building a custom model to filter all the invalid data and cleanse the data.
  • Familiarity with a NoSQL database such as MongoDB, Cassandra.
  • Used Flume for importing log files from various sources into HDFS.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Implemented Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Worked on documentation of all Extract, Transform and Load: designed, developed, validated and deployed the Talend ETL processes for the data warehouse teams using PIG and HIVE on Hadoop.
  • Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
  • Working on PIG Latin Scripts and UDF's while ingestion, querying, processing and analysis of Data.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Implemented JMS for asynchronous auditing purposes.
  • Develop data ingestion jobs in Talend to acquire, stage, and aggregate data in technologies such as HAWQ, Hive, Spark, HDFS.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.

Environment: Hadoop, Agile methodologies, Talend, HDFS, HBase, MongoDb, YARN, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, AVRO, Oracle, My SQL.

Big Data Analyst/Java Developer

Confidential - Kalamazoo, MI


  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Implemented project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
  • Deployed the Big Data Hadoop application using Talend on cloud AWS.
  • Extensively Involved in loading data from UNIX file system to HDFS.
  • Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Used Amazon Redshift to Store and retrieve the data from data-warehouses.
  • Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Performed some unit testing for the development team within the sandbox environment.
  • Used Hive and created Hive tables and also involved in writing Hive UDFs and data loading.
  • Imported data into HDFS and Hive from other data systems by using Sqoop.
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
  • Generated aggregations and groups and visualizations using Tableau.
  • Developed Hive queries to process the data.
  • Presented data and dataflow using Talend for reusability.
  • Developed and maintain several batch jobs to run automatically depending on business requirements.

Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Apache Hama, Talend, Eclipse Indigo, Java, MapReduce, Hive, Sqoop, Pig, Oozie and SQL, Struts, JUnit.

Java Developer



  • Involved in design and development phases of Software Development Life Cycle (SDLC).
  • Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
  • Implemented Model View Controller (MVC) architecture using Jakarta Struts frameworks at presentation tier.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP).
  • Developed various Enterprise Java Bean components to fulfill the business functionality.
  • Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
  • Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
  • Used Core java and object oriented concepts.
  • Used Spring Framework for Dependency injection and integrated it with the Struts Framework.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
  • Deployed application on windows using IBM Web Sphere Application Server.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
  • Implemented SOA architecture with web services using Web Services like JAX-WS.
  • Used ANT scripts to build the application and deployed on Web Sphere Application Server.

Environment: Core Java, Agile methodologies, J2EE, Oracle, SQL Server, JSP, Struts, Spring, JDK, JavaScript, HTML, CSS, AJAX, JUnit, Log4j, Web Services, Windows.

Jr Java/J2EE Developer



  • Involved in specification analysis and identifying the requirements.
  • Participated in design discussions for the methodology of requirement implementation
  • Involved in preparation of the Code Review Document & Technical Design Document
  • Designed the presentation layer by developing the jsp pages for the modules
  • Developed controllers and JavaBeans encapsulating the business logic
  • Developed classes to interface with underlying web services layer
  • Used patterns including MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
  • Worked on Service Layer which provided business logic implementation.
  • Involved in building PL\SQL queries and stored procedures for Database operations.
  • Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
  • Carried out integration testing & acceptance testing
  • Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
  • Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions.

Environment: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans.

Hire Now