We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume Y



  • Over 6 years of experience in software development, deployment and maintenance of applications of various stages.
  • 3 years of experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala and Avro.
  • Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, Spark, Data frames, HBase and MapReduce programming.
  • Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code and Spark - SQL/Streaming for faster testing and processing of data.
  • Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Thorough knowledge with the data extraction, transformation and load in Hive, Pig and HBase
  • Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data.
  • Worked with Apache Sparkwhich provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Hands on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
  • Hadoop Distributions Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution.
  • Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables.
  • Involved in importing the real-time data to Hadoop using Kafka and worked on Flume. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • Experience in designing and implementing of secure Hadoop cluster using Kerberos.
  • Processing this data using SparkStreaming API with Scala.
  • Good exposure to MongoDB, it's functionality and Cassandra implementation.
  • Have a good experience working in Agile development environment including Scrum methodology.
  • Good Knowledge on Sparkframework on both batch and real-time data processing.
  • Expertise in Storm for reliable real-time data processing capabilities to EnterpriseHadoop.
  • Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python & Perl scripts.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
  • Experienced in deployment of Hadoop Cluster using Puppet tool.
  • Excellent knowledge in existing Pig Latin script migrating into Java Spark code.
  • Experience in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradataand DB2 using Sqoop.
  • Strong knowledge in Upgrading Mapr, CDH and HDP Cluster.
  • Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Java Experience Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
  • Methodologies Handful experience in working with different software methodologies like Water fall and agile methodologies.
  • No SQL Databases Worked with NoSQL such as HBase, MongoDB, Cassandra etc.
  • AWS Planned, deployed, and maintained Amazon AWS cloud infrastructure consisting of multiple nodes and Involved in deploying the applications in AWS.
  • Experience in developing web pages using Java, JSP, Servlets, JavaScript, jQuery, Angular JS, Node JS, jQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.


Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, Map R and Apache

Languages: Java, Python, J ruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile, waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, Angular JS, Ext JS and JSON, Node JS.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

Data analytical tools: R and MATLAB

ETL Tools: Talend, Informatica, Pentaho


Confidential, NY

Hadoop/Spark Developer


  • Used Cloudera distribution for Hadoop ecosystem.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Also used Spark SQL to handle structured data in Hive.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Defined the Accumulo tables and loaded data into tables for near real-time data reports.
  • Created the Hive external tables using Accumulo connector.
  • Written Hive UDFs to sort Structure fields and return complex data type.
  • Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR) and setting up environments on Amazon AWS EC2 instances.
  • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real- time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Experience working with Apache SOLR for indexing and querying.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling.
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2.

Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, HBase, Apache Spark, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, Talend, HUE, HCATALOG, Flume, Solr, Git, Maven.

Confidential, New Hartford, NY

Hadoop/ETL Developer


  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Wrote Flumeconfiguration files for importing streaming log data into HBase with Flume.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Map reduce program and adding external jars for the Map-Reduce Program.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on JVM performance tuning to improve Map-Reduce jobs performance

Environment: Hadoop, MapReduce, HDFS, Hive, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Teradata, Tomcat 6., Tableau.

Confidential, Philadelphia, PA

JAVA/ETL Developer


  • Developed Maven scripts to build and deploy the application.
  • Developed Spring MVC controllers for all the modules.
  • SAS scripts on UNIX are run, and the output datasets are exported into SAS.
  • Implemented jQuery validator components.
  • Extracted data from Oracle as one of the source databases.
  • Using Data stage ETL tool to copy data from Teradata to Netezza
  • Created ETL Data mapping spreadsheets, describing column level transformation details to load data from Teradata Landing zone tables to the tables in Party and Policy subject area of EDW based on SAS Insurance model.
  • Used JSON and XML documents with Mark logic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
  • SAS data sets were constantly created and updated using the SET and UPDATE statements
  • Built data transformation with SSIS including importing data from files.
  • Loaded the flat files data using Informatica to the staging area.
  • Created SHELL SCRIPTS for generic use.

Environment: Java, Spring, MPP, Windows XP/NT, Informatica Power center 9.1/8.6, UNIX, Teradata, Oracle Designer, Autosys, Shell, Quality Center 10.


Java Developer


  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Implemented Spring IoC framework
  • Developed Spring REST services for all the modules.
  • Developed custom SAML and SOAP integration for healthcare.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Used DAO and JDBC for database access.
  • Built responsive Web pages using Kendo UI mobile.
  • Designed dynamic and multi-browser compatible pages using HTML, CSS, jQuery, JavaScript, Require JS and Kendo UI.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, Java, JSP, JDBC, JavaScript, MySQL, Eclipse IDE, Rest.

Hire Now