We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Minneapolis, MN


  • Over 8+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • 4+years of work experience in ingestion, storage, querying, processing, and analysis of Bigdata with hands on experience in Big data Eco - system related technologies like Map Reduce, Hive, Spark, Cloudera Navigator, Mahout, HBase, Pig, Zookeeper, Scoop, Flume, Oozie and HDFS.
  • Good at working on low-level design documents and System Specifications.
  • Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
  • Extensive experience working in Teradata, Oracle, Netezza, SQL Server, DB2, and MySQL database.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Solid Experience of creating PL/SQL Packages, Procedures, Functions, Triggers, Views and Exception handling for retrieving, manipulating, checking, and migrating complex data sets in Oracle.
  • Strong hold on Informatica PowerCenter, Oracle, Vertica, hive, SQL Server, Shell scripting and QlikView.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Python and Scala.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Experienced in developing and implementing MapReduce programs using Hadoop to work with Big Data requirement.
  • Experience with common Big Data technologies such as Cassandra, Hadoop , HBase, MongoDB, and Impala .
  • Hands on Experience in Big Data ingestion tools like Flume and Sqoop.
  • Experience in Cloudera distribution and Horton Works Distribution (HDP).
  • Good hands on experience in developing Hadoop applications on SPARK using SCALA as a functional and object-oriented programming.
  • Experience in working with different kinds of data files such as XML, JSON, Parquet, Avro, and Databases.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Experience in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Expertise in writing HIVE queries, Pig and Map Reduce scripts and loading the huge data from local file system and HDFS to Hive.
  • Debugging MapReduce jobs using Counters and MRUNIT testing.
  • Experience in developing NoSQL database by using CRUD, Sharding, Indexing and Replication .
  • Experienced in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
  • Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
  • Knowledge of processing and analysing real-time data streams/flows using Kafka and HBase.
  • Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
  • Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
  • Expertise in using job scheduling and monitoring tool like Oozie.
  • Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
  • Good understanding on Spark Streaming with Kafka for real-time processing.
  • Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
  • Knowledge of data warehousing and ETL tools like Informatica, Talend.
  • Experience in developing and scheduling ETL workflows in Hadoop using Oozie with the help of deployment and managing Hadoop cluster using Cloudera and Horton works .
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
  • Experience in layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive) & Engineering (Jobs and Workflows) for developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie for automating workflow.
  • Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
  • Good experience in working with cloud environment like Amazon Web Services EC2 and S3 .
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Expert on Microsoft Power BI and Tableau reports, dashboards, and publishing to the end users for executive level Business Decision.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MSOffice, PL/SQL Developer, SQL*Plus.
  • Ability to meet deadlines and handle multiple tasks, decisive with strong leadership qualities, flexible in work schedules and possess good communication skills.
  • Team player, motivated and able to grasp things quickly with analytical and problem-solving skills.
  • Comprehensive technical, oral, written, and communicational skills.


Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.

Hadoop Distributions: Cloudera, MapReduce, Hortonworks, IBM Big Insights

Languages: Java, Scala, Python, ruby, SQL, HTML, DHTML, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery, and CSS, AngularJs, ExtJS and JSON

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL, and Oracle

RDBMS: Teradata, Oracle Pl/SQL, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS, and Windows

ETL Tools: Informatica Power center

Reporting tools: Tableau


Confidential, Minneapolis, MN

SR. Hadoop Developer


  • Gathered User requirements and designed technical and functional specifications.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, and Sqoop .
  • Loading the data from the different Data sources like ( Teradata and DB2 ) into HDFS using Sqoop and load into Hive tables , which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL .
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE, and Impala to read, write and query the data into HBase.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server .
  • Developed batch processing pipeline to process data using python and airflow . Scheduled spark jobs using airflow.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch .
  • Managed, reviewed Hadoop log file, and worked in analyzing SQL scripts and designed the solution for the process using Spark .
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment: Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Confidential, San Diego, CA

Hadoop Developer


  • Worked on a live 24 node Hadoop cluster running on HDP 2.2.
  • Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
  • Created external and internal tables using HAWQ.
  • Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
  • Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
  • Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
  • Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
  • Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
  • Assisted with performance tuning, monitoring, and troubleshooting.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
  • Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Experienced in reviewing Hadoop log files to delete failures.
  • Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
  • Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analyzing the Hadoop cluster as well as big data.
  • Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
  • Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Experience in using Sequence files, RC file, AVRO and HAR file formats.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Used FLUME to dump the application server logs into HDFS.
  • Automating backups by shell for Linux to transfer data in S3 bucket.
  • Experience in UNIX Shell scripting.
  • Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
  • Automated incremental loads to load data into production cluster.

Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.

Confidential, Santa Monica, CA

Hadoop Developer


  • In depth understanding knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.
  • Involved in moving all log files generated from various sources to HDFS for further processing through flume
  • Imported required tables from RDBMS to HDFS using Sqoop and used Storm and Kafka to get real time streaming of data into HBase.
  • Involved in creating hive tables and loading with data writing hive queries that will run internally in a map reduce way.
  • Good experience with NoSQL database HBase and creating HBase tables to load large set of semi structured data coming from various sources
  • Involved in moving all log files generated from various sources to HDFS for further process through flume.
  • Implemented the workflows using apache framework to automate tasks.
  • Written Map Reduce code that will take input as log files and parse the and structures them in in tabular format to facilitate effective querying on the log data.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex map reduce streaming jobs using java language that are implemented using hive and pig.
  • Used hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Importing and exporting data into HDFS and hive using Sqoop.
  • Writing the HIVE queries to extract the data processed.
  • Developed data pipeline using flume, Sqoop, pig and map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Implemented Spark using Scala and utilizing Spark core, Spark streaming and Spark SQL API for faster processing of data instead of Map reduce in Java.
  • Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL.
  • Developed Pig Latin scripts to extract the data from the web server out files to load into HDFS.
  • Created HBase tables to store variable data formats of data coming from different legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.
  • Expert knowledge on MongoDB NoSQL data modelling, tuning, disaster recovery and backup.

Environment: Hadoop, HDFS, MapReduce, Hive, Python, PIG, Java, Oozie, HBASE, Sqoop, Flume, MySQL.

Confidential, Kansas City, MO

Java Developer


  • Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work(tasks).
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security to implement business layer.
  • Developed and Consumed Web services securely using JAX-WS API and tested using SOAP UI.
  • Extensively used Action, Dispatch Action, Action Forms, Struts Tag libraries, Struts Configuration from Struts.
  • Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
  • Developed pages using JSP, JSTL, Spring tags, jQuery, Java Script & Used jQuery to make AJAX calls.
  • Used Jenkins continuous integration tool to do the deployments.
  • Worked on JDBC for database connections.
  • Worked on multithreaded middleware using socket programming to introduce whole set of new business rules implementing OOPS design and principles
  • Involved in implementing Java multithreading concepts.
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
  • Used Servlet, Java, and spring for server-side business logic.
  • Implemented the log functionality by using Log4j and internal logging API's.
  • Used Junit for server-side testing.
  • Used Maven build tools and SVN for version control.
  • Developed frontend of application using Bootstrap, Agular.JS and Node.JS frameworks.
  • Implemented SOA architecture using Enterprise Service Bus (ESB).
  • Designed front-end, data driven GUI using JSF, HTML4, JavaScript and CSS.
  • Used IBM MQ Series as the JMS provider.
  • Responsible for writing SQL Queries and Procedures using DB2.
  • Connection with Oracle, MySQL Database is implemented using Hibernate ORM. Configured hibernate, entities using annotations from scratch.

Environment: Core Java1.5, EJB, Hibernate 3.6, AWS, JSF, Struts, Spring 2.5, JPA, REST, JBoss, Selenium, Socket programming, DB2, Oracle 10g, XML, JUnit 4.0, XSLT, IDE, Angular Js, Node JS, HTML4, CSS, JavaScript, Apache Tomcat 5x, Log4j.




  • Used JSP pages through Servlets Controller for client-side view.
  • Created jQuery, JavaScript plug-ins for UI.
  • Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
  • Implement Restful web services with the Struts framework.
  • Verify them with the J Unit testing framework.
  • Working experience in using Oracle 10g backend Database.
  • Used JMS Queues to develop Internal Messaging System.
  • Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
  • Developed Java, JDBC, and Java Beans using JBuilder IDE.
  • Developed JSP pages and Servlets for customer maintenance.
  • Apache Tomcat Server was used to deploy the application.
  • Involved in Building the modules in Linux environment with ant script.
  • Used Resource Manager to schedule the job in Unix server.
  • Performed Unit testing, Integration testing for all the modules of the system.
  • Developed JAVA BEAN components utilizing AWT and SWING classes.

Environment: Java, JDK, Servlets, JSP, HTML, JBuilder, HTML, JavaScript, CSS, Tomcat, Apache HTTP Server, XML, JUNIT, EJB, RESTful, Oracle.

Hire Now