We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Reston, VirginiA


  • Overall 8 + years of experience in Enterprise Application and product development.
  • Experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like HDFS, MapReduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark, Storm, Scala Kafka, Oozie, Zookeeper, MongoDB, Cassandra, and Solr.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager and Apache Ambari, HDP.
  • Expertise in depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2.
  • Extensive experience in writing MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs
  • Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa
  • Extensive experience in designing online analytical processing-OLAP and online transaction processing- OLTP databases.
  • Experienced working with Horton works Distribution and Cloudera Distribution.
  • Strong experience in programming using Spark & Scala . Experience of programming with various components of the framework, such as Impala . Should be able to code in Python as well.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
  • Good knowledge of Data warehousing concepts ETL and Teradata .
  • Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Developed a job scheduler which can run generic jobs using storm and Kafka.
  • Developed index manager for elastic search which controls the size of indices even when millions of records are ingested per day.
  • Developed several reports using kibana via elastic search.
  • Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Oracle.
  • Strong exposure to NoSQL database like HBase, mongoDB, Cassandra, REDIS.
  • Good knowledge in Kafka and Messaging systems.
  • Experience in ETL tools (like Talend, Pentaho)
  • Integrated Kafka with storm to get "at least once" semantic running.
  • Knowledge in LinkedIn’s Azkaban.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications
  • Actively involved in various product developments and developed a key core framework using several design patterns.
  • Extensive use of multi-threading in core java for highly scalable product.
  • Expertise on JAXWS, JSP, Web Sphere, Servlets, Struts, Web Logic, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, Unix, XML, and HTML
  • Excellent working experience on SQL & PL/SQL and Oracle.
  • Knowledge in Apache Nifi and Log4j.
  • Extensive experience in build/deploy multi module projects using Ant, Maven, GIT, SVN etc.
  • Extensive experience in Web services SOAP and RESTful web services.
  • Good experienced in working with agile, scrum and Waterfall methodologies.
  • Successfully working in fast-paced environment, both independently and in collaborative team environments.


Bigdata Ecosystem: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Avro, Kibana, Spark, Splunk, Hadoop Streaming, Storm, YARN, Crunch

Hadoop Distributions: Cloudera, Hortonworks

SPARK Streaming: Technologies SPARK, Kafka, Storm

Distributed frameworks: Elastic search (search), Kafka (Messaging), Hazelcast (Cache), Storm (processing).

Reporting: Kibana, Tableau with mongo

Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags), JSF, Apache Struts, Junit, Hibernate, EJB 2.0/3.0, JDBC, RMI, JMS, JNDI.

Tools: Ant, Maven, Junit, GitHub, Confluence

Programming Languages: Java, C, C++, Linux shell scripting, Scala.

Web Technologies: HTML, JQuery, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML

Web Servers: Web Logic, Web Sphere, Apache Tomcat,Soap,RESTful

Databases: MySQL, MS-SQL Server, SQL, Oracle 11g, SQL Server, DB2, PL/SQL.

NOSQL Databases: Hbase, MongoDB, Cassandra

Software Engineering: UML, Object Oriented Methodologies, Scrum, Agile methodologies

ETL: Tableau, Talend, Informatica, Pentaho Kettle

Operating Systems: Windows 95/98/2000/XP, MAC OS, UNIX, LINUX.

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

IDE Tools: Eclipse, Rational rose, IntelliJ IDEA, NetBeans


Confidential, Reston, Virginia

Hadoop/Spark Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Extensively worked on elastic search querying and indexing to retrieve the documents in high speeds.
  • Ingested data to elastic search for lightening search.
  • Loading JSON from upstream systems using Spark streaming and load them to elastic search.
  • Written various key queries in elastic search for retrieval of data effectively.
  • Used Spark-Streaming APIs to perform necessary transformations.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Kafka spark integration
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Extensively worked on the core and Spark SQL modules of Spark.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive .
  • Implemented ELK ( Elastic Search, Log stash, Kibana ) stack to collect and analyze the logs produced by the spark cluster.
  • Used Reporting tools like Kibana to connect with Hive for generating daily reports of data.
  • Involved in development of Storm topology for ingestion of data through xml payload and then load them to various distributed stores.
  • Extensively worked on mongo dB like crud operations , sharding etc.
  • Developed REST services which processed several requests triggered from UI.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS in the initial version of the product.
  • Stored the output files for export onto hdfs and later these files are picked up by downstream systems.
  • Developed Spark code using Scala and Spark-SQL /Streaming for faster testing and processing of data and developed very quick poc's on Spark in the initial stages of the product.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using SparkContext , Spark-SQL , Data Frame , Pair RDD's, Spark YARN .
  • Used RESTFUL web services in JSON format to develop server applications.
  • Experienced working with HDP2.4

Environment: Apache Spark Apache Strom, Mongo DB, Elastic Search, Apache Kafka, Zookeeper, Kibana, HDP 2.4

Confidential, Columbia, MD

Big Data Developer


  • Worked with team members for upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
  • Written elastic search template for the index patterns.
  • Implemented and extracted the data from hive using Spark
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Scala/Python.
  • Extracted files from NoSQL database (MongoDB) and processed them with Spark using mongo Spark connector.
  • Involved in creating Hive tables, and loading and analysing data using hive queries.
  • Written hive queries on the analysed data for aggregation and reporting.
  • Involved in creating Hive tables and loading with data.
  • Imported and exported data from different databases into HDFS and Hive using Sqoop.
  • Used Sqoop for loading existing metadata in Oracle to HDFS.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Used BI Tool Tableau for the generating of dashboard reports and visualization of data.
  • Developed graphs by using Tableau ETL tool.
  • Implemented Map Reduce jobs using Java API and PIG Latin.
  • Participated in the setup and deployment of Hadoop cluster.
  • Hands on design and development of an application using Hive (UDF)
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Written client web applications using SOAP web services.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Install Hadoop, Map Reduce, HDFS, and AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.

Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, Apache Kafka, Flume, Zookeeper, Oozie, Yarn, Linux, Sqoop, Java, Scala, Tableau,SOAP, REST, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra

Confidential, New York, NY

Hadoop Developer


  • Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
  • Involved in Installing and configuring Hadoop Map Reduce and HDFS.
  • Developed multiple Map Reduce jobs in java for data extraction and transformation.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked in Importing and exporting data into HDFS and Hive using Sqoop.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Involved in managing and reviewing Hadoop log files.
  • Responsible to manage data coming from different sources.
  • Developed Map Reduce Programs which are running on the cluster.
  • Involved in creating Pig tables, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.

Environment: Hadoop, Map Reduce, HDFS, Sqoop, Maven, Pig, Apache Flume, Oozie, Hive, Github


Java Developer


  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Lead, Core java developer and very actively involved in end to end delivery of the product.
  • Took responsibility of estimations, design, and development.
  • Key involvement in design and development of our product's framework.
  • Frameworks include use of several design patterns like factory pattern to spawn new tasks required.
  • Introduced several key components to the product like caching and aggregation.
  • Won accolades for resolving critical performance related issues of the product.
  • Involved in coding for JUnit Test cases, ANT for building the application.
  • Developed Technical design documents for the product.
  • Resolved performance issues analysing thread dumps, GC logs, CPU Stats.
  • Analysed performance issues using AWR reports.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, Hibernate, JDBC, XML, UML, JUnit, log4j


Web Developer


  • Redesigned the existing site and to create new interfaces.
  • Involved in extensive HTML coding and developed web forms.
  • Developed data insertion forms and validated them using JavaScript and CSS.
  • Designed dynamic client-side JavaScript codes to build web forms and simulate process for web application, page navigation and form validation.
  • Worked closely with the programmers and graphic designers for project requirement and analysis.
  • Developed REST/JSON services.
  • Produced GUI prototypes for business logic presentations.
  • Participated in bug thrashing sessions to discuss and resolve bugs with developers.
  • Created Stored Procedure, Trigger for database access and events.

Environment: HTML, CSS, JQuery, JavaScript, XML, MS SQL, sublime, Adobe Photoshop, Dreamweaver.

Hire Now