We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

San Diego, CA


  • Over 7+ years of IT experience in Analysis, Design, Development and in Big Data,Scala, SparkHadoop and HDFS environment and experience in JAVA, J2EE.
  • Experienced in developing and Implementing MapReduce programs using Hadoop to work with Big Data as per the requirement.
  • Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern - matching, Map-reducing, Frame-works like lift framework and Play framework, RDD (Resilient Distributed Datasets).
  • Extensive testing ETL experience using Informatica 8.1 /7.1/6.2 (Power Center/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager).
  • Developed Talend ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
  • Developing core java code and strong in Design, Software processes, Requirement gathering, Analysis and development of software applications in the roles of Programmer Analyst, Big Data Developer.
  • Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
  • Increased the accuracy, stability, and automation of the reverse DNS creation process
  • Ensure that users have continuous access to e-mail, intranet, the public Internet and other crucial business applications
  • Knowledge on Networking Technologies like TCP/IP, DNS and webservers.
  • Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), MapReduce, Hive, Sqoop, Maven, HBASE, PIG, Kafka, Zoo Keeper, Scala, Flume, Storm and Oozie.
  • End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and relatedLinux.
  • Good Knowledge of Hadoop architecture and various components such as HDFS Framework, Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and MRV2 (YARN)
  • Experienced in developing MapReduce jobs in Java for data cleansing, transformations, pre-processing and analysis. Multiple mappers are implemented to handle data from multiple sources.
  • Experienced on Spark and Scala, Spark SQL, Spark Streaming, Spark GraphX, SparkMlib.
  • Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
  • Experienced on Hadoop daemon functionalities, resource utilizations and dynamic tuning in order to make cluster available and efficient.
  • Expertise in writing custom UDF's for extending Hive and Pig core functionality.
  • Experienced in setting up data gathering tools such as Flume and Sqoop.
  • Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
  • Excellent Knowledge on NOSQL Databases like Cassandra, MongoDB and HBASE.
  • Experienced in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hiveserdes like REGEX, JSON and Avro.
  • Experienced in Scripting using UNIX shell script. Experienced in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
  • Extensively worked on the TalendETL mappings, analysis and documentation of OLAP reports requirements. A good understanding of OLAP concepts working especially with large data sets.
  • Experienced in Dimensional Data Modeling using star and snowflake schema.
  • Good knowledge on Data Mining and Machine Learning techniques. Proficient in Oracle … SQL and PL/SQL.
  • Experienced in integration of various data sources like Oracle, DB2, and Sybase, SQL server and MS access and non-relational sources like flat files into staging area.
  • Experienced in large cross-platform applications using JAVA, J2EE with experience in Java core concepts like OOPS, Multi-threading, Collections and IO.
  • Experienced on applications using Java, RDBMS, and Linux shell scripting.
  • Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
  • Have the ability to be a value contribution to the company.


Hadoop Eco System: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Pig, HDFS, ZooKeeper, FlumeHBASE, Impala, Spark, Storm, Hadoop (Cloudera), Horton Works and Pivotal).

No SQLDatabases: HBASE, Cassandra, MongoDB

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, NetBeans, Eclipse

Languages: Java, SAS, Scala and Apache Spark, SQL, PL/SQL, PIG Latin, HiveQL, UNIX

Databases: Oracle … My SQL, DB2, MS SQL Server

Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic

Web Services: WSDL, SOAP, REST

Methodologies: Agile, Scrum


Confidential - San Diego, Ca

Sr. Hadoop Developer


  • Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
  • Wrote the Spark code in Scala to connect to HBASE and read/write data to the HBASE table.
  • Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
  • Involved in designing and deploying multiple applications utilizing almost all of the Amazon Web Services (AWS) stack (Including EC2,EBS,Internal ELB, Route53, S3, IAM) focusing on high-availability, fault tolerance, and auto-scaling.
  • Experience in migration of projects to AWS cloud. Designing the architecture for moving the code to cloud.
  • Experience on Java 8 development structure for Eclipse. Used various core java concepts like Collections and Multithreading for complex data computations and analysis.
  • Developed business and transaction services using Servlets and some core java concepts like Multithreading, Concurrent Hash Map and I/O Streams.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Designing and implementing DR Strategy for several environments in cloud. Involved in designing and deploying multiple applications utilizing almost all of the Amazon Web Services (AWS).
  • Functional, non-functional and performance testing of key systems prior to cutover to AWS.
  • Running of Apache Hadoop, CDH and Elastic, Map-Reduce(EMR) on (EC2).
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3bucket in AWS.Big Data tool to load the big volume of source files from S3 to Redshift.
  • Used different Serdes for converting JSON data into pipe separated data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Creating the Spark Streaming code to take the source files as input. Used Oozie workflow to automate all the jobs.
  • Developed spark programs using Scala, Involved in creating Spark SQL Queries and Developed Oozieworkflow for spark jobs.
  • Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.Coordinate for development of Jenkins jobs.
  • Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Kafka and Sqoop.
  • Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Sparkand Apache Storm etc.
  • Ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Developed Bankers Rounding UDF for Hive/Pig or Implemented Teradata Rounding in Hive/Pig.
  • Continuously monitored and managed the Hadoop Cluster using ClouderaManager.

Environment: Hadoop, Map Reducer, Aws, S3, Redshift, HDFS, Hive, Pig, Spark, Storm, Flume, Kafka, Sqoop, Java 8, Oozie, Impala, SQL, Scala, Java (JDK 1.6), Hadoop (Cloudera) and Eclipse.

Confidential - Des Moines, Iowa

Sr. Big Data Developer


  • Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications Data to Hadoop.
  • Installed and configured Hive, Pig and Sqoop on the HDP 2.0 cluster.
  • Performed real time analytics on HBase using Java API and Fetched data to/from HBase by writing Map Reduce job. Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing using HDP 2.0
  • WroteSQL queries to process the data using SparkSQL.Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and making the data available.
  • Extracted data from different databases and to copy into HDFS file system using Sqoop.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work.
  • Worked on project to retrieve log messages procured by leveraging Spark Streaming.
  • Designed Ooziejobs for the auto processing of similar data. Collect the data using Spark Streaming .
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
  • Performed transformations, cleaning and filtering on imported data using Hive , MapReduce , and loaded final data into HDFS .
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Coordinate and plan with Application teams on MongoDB capacity planning for new applications.
  • Created aggregation queries for reporting and analysis. Collaborated with development teams to define and apply best practices for using MongoDB.
  • Built data lake ecosystem using Hadoop technologies, such as Hive, HBASE, Map-reduce, Pig, HDFS, Scala and Spark to ingest and process Kinesis data Streams.
  • Developed spark application for filtering JSON source data in location and store it into HDFS with partitions and used Spark to extract schema of JSONfiles.
  • Imported the data from different sources like Talend ETL, Local file system into Spark RDD .
  • Responsible for importing data from MySQL to HDFS and provide the query capabilities using HIVE.
  • Used Sqoop to import the data from RDBMS to HadoopDistributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Developed the Sqoop scripts to make the interaction between Pig and MySQL Database.
  • Involved in writing shellscripts in scheduling and automation of tasks.
  • Managed and reviewed Hadoop log files to identify issues when Job fails.

Environment: Hadoop, Map Reducer, HDFS, Jenkins, MongoDB,Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java (JDK 1.6), Hadoop (Horton Works-HDP 2.0) and Eclipse.

Confidential - Houston, TX

Hadoop Developer


  • Gathered business requirements from the Business Partners and subject matter experts and prepared Business Requirement document.
  • Developed simple to complex Map/Reduce jobs using Hive and Pig.
  • Handled importing of data from various data sources performed transformations using Hive, MapReduce, and loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Good exposure in setting up mongo environments for different use cases. Creating and deploying mongo instances and clusters from a central repository.
  • Experienced in fixing mongo slave replication lag issues. Experienced in Mongo Profiling and logging.
  • Wrote entities in Scala and Java along with named queries to interact with database.
  • Analyzed the data by performing Hive queries and Pig scripts to study customer behavior.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HadoopDistributed File System and PIG to pre-process the data.
  • Installed, Configured Cognos8.4/10 on single and multi-server environments.
  • Involved inSpark Streaming which collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQLstore (HBASE).
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Used UDF's to implement business logic in Hadoop.
  • Created HBASE tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • UsedNoSQL database Cassandra for information retrieval.
  • Implemented of Regression analysis using MapReduce.
  • Developed scripts and Batch Jobs to schedule various Hadoop programs using Oozie.
  • Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
  • TestedMapReduce code using JUnit testing.
  • UsedCloudera manager to monitor the health of the jobs which are running on the cluster.

Environment: Java (JDK 1.6), Hadoop, MapReduce, MongoDB, Pig, Hive, Scala, Spark, Cassandra, Sqoop, Oozie, HDFS, Hadoop (Cloudera), MySQL, Eclipse, Oracle.


JAVA Developer


  • Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
  • Interacting with the system analysts & business users for design & requirement clarification.
  • Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
  • Developed JSPs according to requirement.
  • Excellent knowledge of NOSQL on Mongo and CassandraDB.
  • Developed integration services using Web Services, SOAP, and WSDL.
  • Designed, developed and maintained the data layer using the ORM framework in Hibernate.
  • Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.
  • Presented top level design documentation to the transition of various groups.
  • Used spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated spring with JSF.
  • Wrote AngularJScontrollers, views, and services.
  • UsedAnt for building and the application is deployed on JBOSS application server.
  • Developed HTML reports for various modules as per the requirement.
  • Analyzed known information into concrete concepts and technical solutions.
  • Assisted in writing the SQL scripts to create and maintain the database, roles, users, tables in SQL Server.

Environment: Java, JDBC, spring, JSP, JBOSS, Servlets, Maven, Jenkins, Flex, HTML, AngularJS, Mongo DB, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.


Jr. JAVA Developer


  • Analyzed Object Oriented Design and presented with UML Sequence, Class Diagrams.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed components using Java multithreading concept.
  • Developed various EJBs (session and entity beans) for handling business logic and data manipulations from database.
  • Involved in design of JSP's and Servlets for navigation among the modules.
  • Designed cascading style sheets and XSLT and XML part of Order entry Module & Product Search Module and did client side validations with java script.
  • Hosted the application on Web Sphere.

Environment: J2EE, Java/JDK, PL/SQL, JDBC, JSP, Servlets, JavaScript, EJB, JavaBeans, UML, XML, XSLT, Oracle9i,HTML/DHTML,UML,JavaScript

Hire Now