We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Uniondale, NY


  • Professional Software developer with around 9 years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors expertizing in Bigdata analyzing Frame works and Java/J2EE technologies
  • 5+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, Solr and Zookeeper.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB, HBase, Cassandra.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Running of Apache Hadoop, CDH and Map-R distributions, Elastic MapReduce (EMR) on (EC2).
  • Expertise in developing Pig Latin scripts and Hive Query Language.
  • Developed Customized UDFs and UDAF’s in java to extend HIVE and Pig core functionality.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
  • Working knowledge in installing and maintaining Cassandra by configuring the Cassandra yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
  • Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioner’s to deliver the best results for the large datasets.
  • Good knowledge on build tools like Maven, Log4j and Ant.
  • Hands on experience in using various Hadoop distributions (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
  • Experienced in writing Ad Hoc queries using ClouderaImpala, also used Impala analytical functions.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm and YARN architecture.
  • Proficient in developing, deploying and managing the Solr from development to production.
  • Used various Project Management services like JIRA for tracking issues, GitHub for various code reviews and Worked on various version control tools like CVS, GIT, and SVN.
  • Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
  • Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX and had a bird’s eye view on React Java Script Library.
  • Experience in maintaining an Apache Tomcat MYSQL, LDAP, Web service environment.
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
  • Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib (Spark).
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  • Knowledge on AWS (Amazon EC2) Hadoop distribution.
  • Developed high-throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
  • Wrote and worked on complex performance improvements on PL/SQL queries, stored procedures, triggers, indexes with databases like MySQL and Oracle.
  • Also, working towards improvement of knowledge on No-SQL databases like MongoDB.
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.


Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Solr, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, Scala, SQL, JavaScript and C/C++

No SQL Databases: Cassandra, MongoDB, HBase and Amazon DynamoDB.

Java Technologies: JSE, Servlets, JavaBeans, JSP, JDBC, JNDI, AJAX, EJB and struts

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and JSON

Development / Build Tools: Eclipse, Jenkins, Git, Ant, Maven, IntelliJ, JUNIT and log4J.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Oracle 10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, Red Hat LINUX, Mac OS and Windows Variants

Testing: Junit

ETL Tools: Talend


Confidential, Uniondale, NY

Sr. Hadoop Developer


  • Creating the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Predicted consumer behavior, such as what products a particular user's has bought and made predictions/recommendations based on recognizing patterns by using Hadoop, Hive and Pig queries.
  • Installed and configured Hadoop, MapReduce, and HDFS.
  • Developed multiple MapReduce jobs using Java API for data cleaning and pre-processing.
  • Importing and exporting data into HDFS and HIVE from an Oracle 11g database using Sqoop
  • Responsible to manage data coming from different sources.
  • Monitoring the running MapReduce programs on the cluster.
  • Responsible for loading data from UNIX file systems into HDFS.
  • Installed and configured Hive.
  • Worked with application teams to install Hadoop updates, patches, version upgrades as required.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.0 cluster.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
  • Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Installed Apache Tez, a programing framework which is built on YARN in increase performance.
  • Experience on deployment of Apache Tez on top of YARN.
  • Experience in Migrating Business reports to Spark, Hive, Pig and Map Reduce.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.0.
  • Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
  • Involved in Installation and configurations of patches and version upgrades.
  • Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, troubleshooting.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in HDFS maintenance and administering it through Hadoop-Java API
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Installed and configured Pig.
  • Experienced in Big Data technologies such as Hadoop, Cassandra, Presto, Spark, Flume, Storm, AWS, SQL
  • Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
  • Developed the Sqoop scripts to order and make the interaction between Pig and MySQL Database.
  • Developed scripts and automated data management from end to end and sync up Confidential /w all the clusters.

Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Tez, Hue, HCatalog, Java, Eclipse, VSS, Red Hat Linux

Confidential, Plano, TX

Sr. Hadoop/Spark Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for managing and scheduling Jobs on a Hadoop cluster.
  • Loading data from UNIX file system to HDFS and vice versa.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Apache Spark for large data processing integrated with functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store the stream data to HDFS.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for creating Hive tables and working on them using Hive QL.
  • Implementing various Hive UDF’s as per business requirements.
  • Involved in Data Visualization using Tableau for Reporting from Hive Tables.
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Customized Apache Solr to handle fallback searching and provide custom functions.
  • Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, HBase, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins.

Confidential - NJ

Hadoop Developer


  • Analyze large datasets to provide strategic direction to the company.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Involved in analyzing the system and business.
  • Developed SQL statements to improve back-end communications.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
  • Created reports and dashboards using structured and unstructured data.
  • Involved in importing data from MySQL to HDFS using SQOOP.
  • Involved in writing Hive queries to load and process data in Hadoop File System.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in working with Impala for data retrieval process.
  • Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
  • Sentiment Analysis on reviews of the products on the client's website.
  • Exported the resulted sentiment analysis data to Tableau for creating dashboards
  • Experienced in Agile processes and delivered quality solutions in regular sprints.
  • Developed custom MapReduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Responsible for loading and transforming large sets of structured, semi structured, and unstructured data.

Environment: Cloudera, JDK 1.5, CDH4.3, Hadoop, Map Reduce, HDFS, Hive, Mango DB, SQOOP, MYSQL, SSQL, Impala, Tableau.


Java/ Hadoop Developer


  • Involved in Requirements Analysis, and design an Object-oriented domain model.
  • Involvement in the detailed Documentation, written functional specifications of the module.
  • Involved in development of Application with Java and J2EE technologies.
  • Develop and maintain elaborate services based architecture utilizing open source technologies like Hibernate, ORM and Spring Framework.
  • Developed server-side services using Java multithreading, Struts MVC, Java, EJB, and spring, Web Services (SOAP, WSDL, and AXIS).
  • Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
  • Designing, Development and Implementation of JSPs in Presentation layer for Submission, Application, and reference implementation.
  • Development of JavaScript for client end data entry validations and Front-End Validation.
  • Deployed Web, presentation and business components on Apache Tomcat Application Server.
  • Developed PL/SQL procedures for different use case scenarios
  • Involvement in post-production support, Testing and used JUNIT for unit testing of the module.

Environment: Java/J2EE, JSP, XML, Spring Framework, Hibernate, Eclipse(IDE), Java Script, Ant, SQL, PL/SQL, Oracle, Windows, UNIX, Soap, Jasper reports.


Java/J2EE Developer


  • Involved in writing programs for XA transaction management on multiple databases of the application.
  • Developed java programs, JSP pages and servlets using Cantata Struts framework.
  • Involved in creating database tables, writing complex TSQL queries and stored procedures in the SQL server.
  • Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
  • Used EJBs in the application and developed Session beans to implement business logic at the middle tier level.
  • Actively involved in writing SQL using SQL Query Builder.
  • Involved in coordinating the on-shore/Off-shore development and mentoring the new team members.
  • Extensively Used Ant tool to build and configure J2EE applications and used Log4J for logging in the application
  • Used JAXB to read and manipulate the xml properties.
  • Used JNI for calling the libraries and other implemented functions in C language.
  • Used prototype MooTools and script.aculo.us for fluid User Interface.
  • Involved in fixing defects and unit testing with test cases using JUnit.

Environment: Java, EJB, Servlets, XSLT, CVS, J2EE, AJAX, Struts, Hibernate, ANT, Tomcat, JMS, UML, Log4J, Oracle 10g, Eclipse, Solaris, JUnit and Windows 7/XP, Maven.

Hire Now