We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

Kansas City, MO


  • Around 9years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big dataapplications.
  • 5years of hands on experience across Hadoop and that includes extensive experience into BigData technologies.
  • Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce , Hive , Pig , Hbase , Flume , YARN , Sqoop , SparkStreaming , SparkSQL , Storm , Kafka , Oozie , Cassandra and Zookeeper.
  • Hands on experience of multiple distributions like Cloudera , Hortonworks and Mapr .
  • Experience in installation, configuration, supporting and managing ClouderaHadoop platform along with CDH4 and CDH5 clusters.
  • Capable of processing large sets of structured , semi - structured and unstructured data and supporting systems application architecture.
  • Experience in Automating Sqoop , Hive and Pig scripts using Oozie work flow scheduler.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS .
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners.
  • Good experience in Python and Shell scripting .
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Experience in using different file formats like CSV , Sequence , AVRO , RC , ORC , JSON and PARQUET files and different compression Techniques like LZO,Gzip,Bzip2 and Snappy .
  • Experience in big data ingestion tools like Sqoop , Flume and ApacheKafka .
  • Experience in using Flume and Kafka to load the log data from multiple sources into HDFS .
  • Hands on experience with NoSQL Databases like Hbase , MongoDB and Cassandra .
  • Experience in retrieving data from databases like MYSQL , Teradata,Informix,DB2 and Oracle into HDFS using Sqoop and ingesting them into Hbase and Cassandra .
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.


Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hbase, Flume, Sqoop, Impala, Oozie, Zookeeper,ApacheSpark, Kafka,Scala,MongoDB, Cassandra.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.

No SQL Databases: Cassandra, MongoDB, Hbase, CouchDB.

Java: J2EE, JSP, CSS, Jquery, Servlets,HTML,Java Script

Mainframe: JCL, COBOL, CICS, DB2.

Databases: MYSQL,Oracle,DB2 for Mainframes,Teradata,Informix.

Operating Systems: Windows, Unix, Linux

Other Tools: Putty, WINSCP, Filezilla, Streamweaver, Compuset.

Languages: Java, SQL, HTML, JavaScript, JDBC, XML, and C.

Frameworks: Struts, spring, Hibernate.

App/Web servers: WebSphere, WebLogic, JBoss, Tomcat.



Confidential, Kansas City, MO


  • Responsible for building Scala ble distributed data solutions using Hadoop .
  • Optimized Hive queries and used Hive on top of Spark engine.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS , Hive .
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka .
  • Used Kafka to ingest data into Spark engine.
  • Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Developed SPARK applications using Scala for easy Hadoop transitions.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie .
  • Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
  • Involved in integration of Hadoop cluster with Spark engine to perform BATCH and GRAPHX operations.
  • Worked on indexes, scalability and query language supporting using Cassandra
  • Created SQOOP scripts for importing data from different data sources to Hive and Cassandra .
  • Used HUE for running Hive queries and created partitions according to day using Hive to improve performance.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS .
  • Developed a Data flow to pull the data from the REST API using Apache Nifi with context configuration enabled.
  • Written Python , Shell scripts for various deployments and automation process.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Developed data pipeline using Flume, Sqoop , Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.

Environment: Apache Spark , Kafka , Map Reduce , Cassandra , YARN , Sqoop , Oozie , HDFS , Hive , Pig , Java , Hadoop distribution of Cloudera 5.4/5.5, Linux, XML, Eclipse, MySQL.


Confidential, Chicago, Illinois

Responsibilities :

  • Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra .
  • Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
  • Configuring of Hive , PIG , Impala , Sqoop , Flume and Oozie in Cloudera(CDH5) .
  • Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and importing data from various sources to the Cassandra cluster using JavaAPI 's.
  • Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in creating data-models for customer data using Cassandra Query Language.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Develop wrapper and utility automation scripts in Python .
  • Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
  • Write scripts to automate application deployments and configurations monitoring YARN .
  • Written MapReduce programs in Python with the Hadoop streaming API.
  • Involved in creating Hive tables and loading them with data and writing Hive queries.
  • Migration of some ETL processes from Microsoft SQL Server to Hadoop utilizing Pig as data pipe line for easy data manipulation.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Involved in importingdata from Oracle tables to HDFS and Hbase tables using Sqoop .
  • Developed scripts which will load the data into Spark RDD and do in memory data Computation to generate the output.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Involved in converting Cassandra / Hive / SQL queries into Spark transformations using SparkRDD s in Scala .
  • Experience in Elastic search technologies in creating custom Solr Query components.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Worked on different data sources such as Oracle , Netezza , MySQL , Flat files etc.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza .
  • Good knowledge in using apache NiFi to automate the data movement betweendifferent Hadoop systems.
  • Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.

Environment : Cloudera , Map Reduce , SparkSQL , SparkStreaming , Pig , Hive , Flume , Hue , Oozie , Java , Eclipse, Zookeeper , Cassandra , Hbase , Talend , Github.


Confidential, Milwaukee, Wisconsin


  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig , Hive , Hbase database and Sqoop .
  • Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
  • Developed Map Reduce programs for some refined queries on big data.
  • Experienced in working with Elastic MapReduce (EMR).
  • Creating Hive tables and working on them for data analysisto cope up with the requirements.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Worked with business team in creating Hive queries for ad hoc access.
  • In depth understanding of Classic MapReduce and YARN architectures.
  • Implemented Hive Generic UDF' s to implement business logic.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Installed and configured Pig for ETL jobs.
  • Developed Pig UDF’s to pre-process the data for analysis.
  • Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and SparkStreaming .
  • Developed Spark code using Scala and Spark - SQL / Streaming for faster testing and processing of data.
  • Used Apache NiFi to copy the data from local file system to HDFS .
  • Developed SparkStreaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS .
  • Involved in continuous monitoring of operations using Storm .
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented indexing for logs from Oozie to Elastic Search .
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend .

Environment: Hortonworks , Hadoop , Map Reduce , HDFS , Hive , Pig , Sqoop , ApacheKafka , ApacheStorm , Oozie , SQL , Flume , Spark , Hbase , Cassandra , Informatica , Java , Github .


Confidential, Chicago, IL

Responsibilities :

  • Analyzed data using Hadoop Components Hive and Pig .
  • Experienced in development using Cloudera distribution system.
  • Worked Hands on with ETL process.
  • Developed Hadoop Streaming jobs to ingest large amount of data.
  • Load and transform large data sets of structured, semi structured and unstructured data using Hadoop / Big Data concepts.
  • Involved in doing POC’s for performance comparison of SparkSQL with Hive .
  • Imported data using Sqoop from Teradata using Teradata connector .
  • Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
  • Worked in AWS environmentfor development and deployment of custom Hadoop applications.
  • Install and set up HBASE and Impala .
  • Used Apache Impala to read, write and query the Hadoop data in HDFS , Hbase and Cassandra .
  • Implemented Partitioning , Dynamic Partitions and Buckets in Hive .
  • Supported Map Reduce Programs those are running on the cluster.
  • Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
  • Configured Talend ETL on single and multi-server environments.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Bulk load the data into Oracle using JDBC template.
  • Created Groovy scripts to load the CSV files into table into Oracle tables.

Environment : Cloudera , HDFS , Pig , Hive , Map Reduce , python , Sqoop , Storm , Kafka , LINUX , Hbase , Impala , Java , SQL , Cassandra , MongoDB , SVN .


Confidential, Boston, Massachusetts


  • Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
  • Used Hibernate Framework for persistence onto oracle database.
  • Written and debugged the ANT Scripts for building the entire web application.
  • Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
  • Implemented Java Message Services(JMS) using JMS API.
  • Involved in managing and reviewing Hadoop log files.
  • Installed and configured Hadoop , YARN , Map Reduce , Flume , HDFS , developed multiple Map Reduce jobs in Java for data cleaning.
  • Coded Hadoop Map Reduce jobs for energy generation and PS.
  • Coded using Servlets, SOAP Client and Apache CXF RestAPI' s for delivering the data from our application to external and internal for communication protocol.
  • Worked on Cloudera distribution system for running Hadoop jobs on it.
  • Expertise in writing Hadoop Jobs to analyze data using Map Reduce , Hive , Pig and Solr, Splunk .
  • Created SOAP Web Service using JAX-WS , to enabled client to consume a SOAP Web Service.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems ( RDBMS ) and vice-versa.
  • Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.

Environment: Mapr, Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.




  • Involved in projects utilizing Java, JavaEE web applications to create fully-integrated client management systems
  • Developed UI using HTML , JavaScript , JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML .
  • Participated in user requirement sessions to analysis and gather Business requirements.
  • Development of user visible site using Perl , back end admin sites using Python and big data using core java.
  • Involved in development of the application using Spring Web MVC and other components of the
  • Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
  • Implemented Object-relation mapping in the persistence layer using Hibernate(ORM) framework.
  • Implemented REST Web Services with Jersey API to deal with customer requests
  • Experienced in developing Restful web services: consumed and also produced.
  • Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
  • Implemented Spring JDBC for connecting Oracle database.
  • Designed the application using MVC framework for easy maintainability
  • Provided bug fixing and testing for existingweb applications.
  • Involved in full system life cycle and responsible for Developing, Testing, Implementing.
  • Involved in Unit Testing, Integration Testing and System Testing.
  • Implemented Form Beans and their Validations.
  • Written Hibernate components.
  • Developed client side validations with Java script.

Environment: Spring, JSP, Servlets,REST,Oracle, AJAX, Java Script, JQuery, Hibernate, WebLogic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services,XSLT, XSD, UNIX, Maven,Mockito Junits,Jenkins, shell scripting, MVS, ISPF.

Hire Now