We provide IT Staff Augmentation Services!

Sr.hadoop Developer Resume

Minneapolis, MN


  • 8 years of IT experience, including 3 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, SPARK, SCALA and Big Data Analytics. 5 years of experience in Database Architecture, Core Java, JSP, Servlets, JavaScript, XML, JQuery, Python and Scala scripting.
  • Worked extensively on Database programming, Database Architecture, Hadoop.
  • Having 3 years of hands on experience working with HDFS, MapReduce framework and Hadoop ecosystem like Hive, HBase, Sqoop, and Oozie.
  • Good understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
  • Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig and Flume.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
  • Worked on backend using Scala and Spark to perform several aggregation logics.
  • Exposed in working with SPARK data frames and optimized the SLA’s.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Hands on experience in Linux Shell Scripting. Worked with Big Data distributions Cloudera (CDH 5.8.3).
  • Hands on experience in Micro strategy and Tableau to generate Hadoop data report.
  • Worked more into Production deployment on every month end Hadoop release items.
  • Responsible for hadoop production support to run the Hadoop autosys job’s and validate the data and communicate to business.
  • Worked on installing Autosys JIL files and configure the autosys job’s to schedule hadoop tasks.
  • Involved in creating POCs to ingest and process streaming data using Spark and HDFS.
  • Expert in SQL Server RDBMS and have worked extensively on PL/SQL.
  • Expert in writing complicated SQL Queries and database analysis for good performance.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Multithreading in Core Java, J2EE, Web Services (REST, SOAP), JDBC, Java Script and JQuery.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, and EMR.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Expert in Building, Deploying and Maintaining the Applications.
  • Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
  • Experience in Scrum, Agile and Waterfall models.
  • Worked on 24*7 environments to provide production support.
  • Co-ordinate with offshore team and cross-functional teams to ensure that application are properly tested, configured and deployed.


Programming Languages: C, C++, JAVA, Scala.

Hadoop / Big Data Stack: Hadoop, HDFS, YARN, MapReduce, Pig, Hive, Spark, Spark-SQL, Kafka, Oozie, Zookeeper, HBase, Spark, Sqoop, Flume, Storm.

Hadoop Distributions: Cloudera, Horton Works, MapR.

Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.

No SQL Databases: HBase, Cassandra.

Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.

Frameworks: MVC, Struts, Spring, And Hibernate.

IDE s: Eclipse, NetBeans, IntelliJ.

Build& Integration Tools: Maven, SBT.

Operating Systems: Windows, Linux, Unix and CentOS.

Query Language: HiveQL, Spark SQL, Pig, SQL, PL/SQL


Confidential - Minneapolis, MN

Sr.Hadoop Developer


  • Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
  • Involved in moving legacy data from Sybase data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, ZooKeeper, Kafka, Spark, Cassandra on Hortonworks.
  • Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Created Java based Spark refiners to replace existing SQL Stored Procedures.
  • Created Hive refiners for simple UNIONS and JOINS.
  • Have experience in executing Hive Queries using Spark SQL that integrates Spark environment.
  • Implemented near real time data pipeline using framework based on Kafka, Spark and MemSQL.
  • Used REST services in Java and Spring to expose data in the lake.
  • Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
  • Created reconciliation jobs for validating data between source and lake.
  • Used Scala to test Dataframe transformations and debugging issues with data.
  • Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Used Avro format for staging data and ORC for final repository.
  • Worked on the data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to java web services and which has been accessed by the end users.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
  • Experience in tuning Hive Queries and Pig scripts to improve performance.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Used Eclipse and Ant to build the application.
  • Performed unit testing and integration testing using Junit framework.
  • Configured build scripts for multi module projects with Maven and Jenkins.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
  • Designing technical architecture and developed various Big Data workflows using custom MapReduce, Pig, Hive and SQOOP.
  • Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF's in Hive querying.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Assigned the tasks of resolving defects found in testing the new application and existing applications.

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, Oozie, Sqoop, HBase, Sybase, Java, Kafka, UNIX, Maven, Junit, SVN, MapR, HortonWok Data Platform (HDP).

Confidential, Sunnyvale, CA

Hadoop Developer


  • Analyzed the data using Spark, Hive and produced summary results to downstream systems.
  • Create/Modify Shell scripts for scheduling data cleansing scripts and ETL loading process.
  • Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala (Prototype).
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Implemented Spark using Java for faster testing and processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing Transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, and EMR.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Imported application from AWS Lambda to store data in S3.
  • Worked with AWS Kinesis to analyze real time streaming data, it makes easy to generate reports for customer’s requirement.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Environment: Hadoop, Map Reduce, HDFS, HBase, Spark, Hive, Pig, Python, Java, SQL, Scoop, Flume, Oozie, Talend, Unix, Java Script, Maven, MRUnit, SVN, Eclipse.

Confidential, Richmond, VA

Hadoop Developer


  • Collaborate with the Internal/Client BA’s in understanding the requirement and architect a data flow system.
  • Developed complete end to end Bigdata processing in hadoop echo system.
  • Optimized hive scripts to use HDFS efficient by using various compression mechanisms.
  • Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
  • Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Primary responsibilities include building scalable distributed data solutions using Hadoop eco system.
  • Import and export data using sqoop into HDFS and also analyze that data.
  • Written PIG scripts to process unstructured data and available to process in Hive.
  • Used Java API to develop a MapReduce code by exporting the data to HDFS using sqoop.
  • Installed, configured and managed Pig and wrote have an experience writing Pig Latin script.
  • Implement different advanced join technique in Hive and Pig using HQL and Pig Latin.
  • Prepared workflow using Oozie for running MapReduce jobs and Hive Queries.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.

Confidential, San Jose, CA

Java Developer


  • Validate Business requirements and initiated process.
  • Interacted with the business users to gather requirements and provided high-level design with Sequential and State-chart diagrams.
  • Involved in Java, J2EE, Struts, Web Services and Hibernate in a fast-paced development environment.
  • Developed UI of Web Service using Struts MVC Framework.
  • Developed use case diagrams, class diagrams, database tables, and provided mapping between relational data base tables and object-oriented java objects using Hibernate.
  • Created and deployed web pages using HTML, JSP, JavaScript, CSS and Angular JS framework.
  • Developed web tier components of web stores using Spring Web MVC framework that leverages Model View Controller (MVC) architecture and used spring tool Suite.
  • Implemented SOA based web services, designed and built SOAP web service interface.
  • Used MVC-Struts framework in the front-end to develop the User Interface.
  • Involved in the implementation of business logic in struts Framework and Hibernate in the back-end.
  • Developed various DOA’s in the applications using Spring JDBC support and fetch, insert, update and deleted data into the database table.
  • Created Tables and relationships, written SQL scripts / Stored Procedures required for Integration, Queuing messages to SBQ, reports as per the requirements.
  • Hibernate Frameworks is used in persistence layer for mapping an object-oriented domain model to a relational database.
  • Developed Web Services and exposed (provider) them to other team for consumption.
  • Used Validator framework to perform JSP form validation.
  • Implemented JDBC to connect with the database and read/write the data.
  • Developed test cases and performed unit testing using Junit.
  • Involved in Sprint meetings and followed agile software development methodologies.

Environment: Java/J2EE, MVC, JSP, EJB, Struts, Hibernate, Web Logic 9.0, JSP, SQL, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, Junit, log4j.


Java Developer


  • Developed many JSP pages, used JavaScript for client side validation
  • MVC framework for developing J2EE based web application
  • Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer specifications, Development and Customization of the Application
  • Developed the User Interface Screens for presentation using Ajax, JSP and HTML
  • Design and develop Servlets, Session and Entity Beans to implement business logic and deploy them on the JBoss Application Server.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Good experience in writing the stored procedures and using JDBC for the database interaction
  • Implemented client side and server side data validations using the JavaScript
  • Developed stored procedures in PL/SQL for Oracle 10g
  • Eclipse is used as an IDE tool to write and debug the application code, SQL developer is used to test and run the SQL statements.

Environment: Java, Eclipse Galileo, HTML4.0, JavaScript, SQL, PL/SQL, CSS, JDBC, JBoss 4.0, Servlets 2.0, JSP 1.0, Oracle.

Hire Now