We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Sterling, VA


  • Over 5 + years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
  • Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Name Node Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop Map Reduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
  • Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In - depth understanding of Data Structure and Algorithms.
  • Experience in managing and reviewing Hadoop log files.
  • Strong backend experience using; Python, Scala, Hive QL, Spark SQL, etc
  • Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
  • Implemented in setting up standards and processes for Hadoop based application design and implementation.
  • Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in Object Oriented Analysis Design OOAD and development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
  • Extensive experience working in Oracle DB2 SQL Server and My SQL database.
  • Good hold on scripting including Shell/Perl and Python.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Hands on experience in application development using Java RDBMS and Linux shell scripting.
  • Experience in Java JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax JQuery XML and HTML.
  • Ability to adapt to evolving technology strong sense of responsibility and accomplishment.


Programming Languages: Scala, Python, Java

Hadoop/Big Data: : HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper

NoSQL Technologies: : Cassandra, MongoDB, HBase

Big data Distribution: : Hortonworks, Cloudera, Amazon EMR cloud

JAVA/J2EE Technologies:: Servlets, JSP, JDBC, EJB, JAXB, JMS, JAX-RPC, JAX- WS, JAX-RS, Apache CFX.

Frameworks:: Struts, Spring, Hibernate.

Web Technologies:: HTML,CSS, JavaScript, jQuery, Ajax, Backbone.js, React, Node.js, Ext JS, Bootstrap.

Development Tools:: Eclipse, Net Beans, IBM RAD, IntelliJ, Spring tool Suite.

Databases:: MySQL, MS-SQL Server, IBM DB2, Oracle.

Operating Systems:: Windows XP/Vista/7/8, 10, UNIX, Linux, Mac OS.

Build Tools:: Ant, Gradle, Maven, Bower.

Web/ Application Servers:: WebSphere, Apache Tomcat, WebLogic, JBoss.


Confidential, Sterling, VA

Hadoop/Spark Developer


  • Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data
  • Developed data pipeline using Map Reduce, Flume, Sqoop to ingest customer behavioral data into HDFS for analysis
  • Migrated Map Reduce jobs to Spark jobs to discover trends in data usage by users
  • Implemented Spark using scala and Spark SQL for faster processing of data
  • Implemented algorithms for real time analysis in Spark
  • Imported data from AWS S3 in to Spark data frames, Performed transformations and actions on data frames
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
  • Real time streaming the data using Kafka with Spark
  • Used the Spark - Cassandra Connector to load data to and from Cassandra
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL)
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting and contributed for performance tuning using Hive
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data
  • Created HBase tables and column families to store the user event data
  • Written automated HBase test cases for data quality checks using HBase command line tools
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs
  • Used Tez framework for building high performance jobs in Pig and Hive
  • Configured Kafka to read and write messages from external programs
  • Configured Kafka to handle real time data
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process

Environment: Hadoop, Spark, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Cloudera manager, AWS S3, MySQL, Cassandra, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

Confidential, Westerville, OH

Hadoop And Spark Developer/Admin


  • Understanding business needs, analyzed functional specifications and mapped those in designing end to end data transformation pipelines.
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Performed importing and exporting data into HDFS from Relational Databases and vice versa using Sqoop.
  • Extensively worked on importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
  • Extensively worked on Hive QL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
  • Wrote Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
  • Developed MR jobs for cleaning, validating and transforming the data.
  • Performed debugging, performance tuning using PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Wrote Pig scripts to transform raw data from several data sources.
  • Used different columnar file formats (RC File, Parquet and ORC formats).
  • Used Cloud era manager to monitor workload, job performance and for capacity planning.
  • Took part in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
  • Performed data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Hands on experience on whole ETL (Extract Transformation & Load) process.
  • ETL development to normalize this data and publish it in IMPALA
  • Worked along with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Worked on NOSQL databases(HBase, MongoDB) for Hybrid implementations.
  • Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Worked with the testing teams to fix bugs and ensure smooth and error-free code.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, Map Reduce, HDFS, Hive, Python, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, MongoDB, PL/SQL, MySQL, DB2, Teradata.

Confidential, Durham, NC

Java/Hadoop Developer


  • Installed and configured Hadoop HDFS, Map Reduce, Pig, Hive, and Sqoop.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
  • Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Demonstrate proficiency in Shell, Python scripts for file validation and processing, job scheduling, distribution and automation
  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
  • Develop Spark apps in Java, Scala or Python
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
  • Real time streaming the data using Kafka with Spark
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Worked on Oozie workflow engine to run multiple Map Reduce jobs.
  • Supported MapReduce Programs those are running on the cluster.
  • Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.

Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Spark, Kafka Hive, Java, Oracle, Eclipse and Shell/Python Scripting.


Hadoop Developer/Admin


  • Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components under Cloudera distribution.
  • Responsible to manage data coming from different sources.
  • Supported MapReduce Programs those are running on the cluster.
  • Wrote MapReduce job using Java API for data Analysis and dim fact generations.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote MapReduce job using Pig Latin.
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Created Hive tables and working on them using Hive QL.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Java, MapReduce, Spark, HDFS, Hive, Pig, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.


Java/Hadoop Developer


  • Review the requirement and analyze the impact.
  • Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
  • Involved in developed the application using Core Java, J2EE and JSP's.
  • Worked to develop this Web based application in J2EE framework which uses Hibernate for persistence, spring for Dependency Injection and Junit for testing.
  • Used JSP to develop the front-end screens of the application.
  • Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
  • Used Indexing techniques in the database procedures to obtain search results.
  • Involved in development of Web Service client to get client details from third party agencies.
  • Developed nightly batch jobs which involved interfacing with external third party state agencies.
  • Test scripts for performance and accessibility testing of the application are developed.
  • Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
  • Provided production support to maintain the application.

Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, Eclipse, Subversion, Oracle, PL/SQL, Web sphere UML, Windows.

Hire Now