We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

Evansville, IN


  • 8 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and Big data applications.
  • Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Mapreduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark, Streaming, Spark SQL, Storm, NIFI, Kafka, and Oozie and Cassandra.
  • Hands on experience in using Mapreduce programming model for Batch processing of data stored in HDFS.
  • Experience in design and development of ETL processes by using Apache NiFi.
  • Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
  • Installed and configured multiple Hadoop clusters of different sizes and with ecosystem Components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Worked on all major distributions of Hadoop Cloud era and Horton works.
  • Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
  • Extensively used Microservices and Postman for hitting the Kubernetes DEV and Hadoop clusters.
  • Deployed various Microservices like Spark, MongoDB, Cassandra in Kubernetes and Hadoop clusters using Docker.
  • Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
  • Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
  • Good Expertise in Planning, Installing and Configuring Hadoop Cluster based on the business needs.
  • Good experience in working with cloud environment like Amazon Web Services (AWS)EC2 and S3.
  • Transformed and aggregated data for analysis by implementing workflow management of Sqoop, Hive and Pig scripts.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
  • Experience in retrieving data from databases like MYSQL, Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra.
  • Experience writing Oozie workflows and Job Controllers for job automation.
  • Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
  • Designed neat and insightful dashboards in Tableau.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Designed and developed multiple J2EE Model 2 MVC based Web Application using J2EE.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, and SQLPlus.


Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark

Programming Languages: Java (5, 6, 7),Python, Scala, C/C++, XML Shell scripting, COBOL

MySQL, SQL/PLSQL, MS: SQL Server 2005, Oracle

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, XML, J query, AJAX

ETL Tools: Cassandra, HBASE, ELASTIC SEARCH, Alteryx.

Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

MSOffice, MS: Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, Net Beans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon EC2

Version Control: CVS, Tortoise SVN

Visualization Tools: Tableau.

Servers IBM: Web Sphere, Web Logic, Tomcat, and Red hat Satellite Server


Confidential, Evansville, IN

Hadoop/Big Data Developer


  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Written Storm topology to emit data into Cassandra DB.
  • Written Storm topology to accept data from Kafka producer and process the data.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Used NoSQL database with Hbase and Mongo DB. Exported the result set from Hive to MySQL using Shell scripts.
  • Processed the spend and goals data in Alteryx in such a way that it is suitable for reporting
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Automated the generation of HQL, creation of Hive Tables and loading data into Hive tables by using Apache NiFi and OOZIE.
  • Worked in generating Java and Groovy codes to process the XML, XSD, CSV, JSON data and incorporated it with NiFi processors to create Hive Tables.
  • Developed complex Alteryx analytic application to model Return on Advertising Spend using Alteryx, R and Tableau.
  • Extensively worked with Cloud era Hadoop distriution components and custom packages
  • Migrated existing java application into Microservices using spring boot and spring cloud.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Wrote Junit tests and Integration test cases for those Microservices.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Used cursors in DB2 to fetch multiple rows at a time to speed up the process to update database.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Migrated Hive QL queries on structured into Spark QL to improve performance
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Used Pig as ETL tool to do Transformations, event joins, filter and some preaggregations.
  • Created UDF's to store specialized data structures in HBase and Cassandra.
  • Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Used Tez framework for building high performance jobs in Pig and Hive.
  • Configured Kafka to read and write messages from external programs.
  • Configured Kafka to handle real time data.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Involved in exploration of new technologies like AWS, Apache Flint, and Apache NIFI etc which can increase the business value.

Environment: Map Reduce, HDFS, Spring Boot, Alteryx, DB2, NIFI, Mongo DB, Microservices, AWS, ETL,Hive, Pig, SQL, Sqoop, Oozie, pyspark, AWS, shell scripting, Apache Kafka, J2EE.

Confidential, Estates, IL

Hadoop/Spark Developer


  • Developed connectors for elastic search and green plum for data transfer from a kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
  • Good experience in handling data manipulation using python Scripts.
  • Imported and exporting data into HDFS and Hive using SQOOP & Developed POC on Apache-Spark and Kafka. Proactively monitored performance, Assisted in capacity planning.
  • Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
  • Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.
  • Create and configured the AWS RDS/Redshift to use Hadoop Ecosystem on AWS infrastructure.
  • Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS Good understanding of performance tuning with NoSQL, and SQL Technologies.
  • Design/Develop framework to leverage platform capabilities using MapReduce, Hive


  • Used NoSQL database with Hbase and Mongo DB, Exported the result set from Hive to MySQL using Shell scripts.
  • Worked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green plum.
  • Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (Map Reduce, Yarm, Pig, Hive, and HDFS).
  • Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
  • Developed Oozie workflows for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed a process for Scooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark context, Spark-SQL, Data Frame, Spark YARN.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analysis.
  • Developed Python utility to validate HDFS tables with source tables.
  • Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive.
  • Designed and developed UDF’S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Responsible for Developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.

Environment: HortonworksHDP 2.5, MapReduce, Mongo DB, AWS, Cassandra, Pyspark, HDFS, Hive, Pig, SQL, Ambari, Cassandra, Sqoop, Flume, Oozie, HBase, Java (jdk 1.6), Eclipse, MySQL and Unix/Linux.

Confidential, Richardson, TX

Bigdata Developer


  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Extracted data from various location and load them into the oracle table using SQL*LOADER.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Developed the Pig Latin code for loading, filtering and storing the data.
  • Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.
  • Handled Imported of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
  • Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive. Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle database into HDFS using Sqoop.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop
  • Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with ETL process.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hive, Pig, Apache Hadoop, Cassandra, Sqoop, Big Data, HBase, Zookeeper, Cloudera, Centos, No SQL, sencha extjs, java script, Ajax, Hibernate, Jms, web logic Application server, Eclipse, Web services, azure, Project Server, Unix, Windows.

Confidential, Highpoint, NC

Hadoop/Java Developer


  • Provided full SDLC application.
  • Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards. Followed test-driven development (TDD) and participated in scrum status reports.
  • Development services including design, integrate, test, and deploy enterprise mission-critical billing solutions.
  • Participated in designing of Use Case, Class Diagram and Sequence Diagram for various Engine components and used IBM Rational Rose for generating the UML notations.
  • Developing Ant, Maven and Shell Scripts to automatically compile, package, deploy and test J2EE applications to a variety of Web Sphere platforms.
  • Experience in developing Business Applications using JBoss, Web Sphere and Tomcat.
  • Perl scripting, shell scripting and PL/SQL programming to resolve business problems of various natures.
  • Client side validations and server side validations are done according to the business needs. Written test cases and done Unit testing and written executing Junit tests.
  • Written ANT Scripts for project build in LINUX environment.
  • Involved in Production implantation and post production support.
  • Supported Map Reduce Programs those are running on the cluster. Gained experience in managing and reviewing Hadoop log files. Involved in scheduling Oozie (version 4.0.0) workflow engine to run multiple pig jobs.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows. Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
  • Implemented SQL, PL/SQL Stored Procedures. Actively involved in code review and bug fixing for improving the performance.
  • Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, spring, Java and XML.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, Kafka, LINUX, Cloudera, Java APIs, Java collection, SQL, NoSQL, HBase, MongoDB


JAVA Developer


  • Involved in analysis, design and development of Expense Processing system.
  • Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object Diagrams to model the detail design of the application using UML.
  • Written Map Reduce jobs in Java, Pig and Python.
  • Developed the application using Spring MVC Framework. Performed Client side validations using Angular JavaScript& Node JavaScript
  • Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application.
  • Created dynamic end to end REST API with Loopback-Node JS Framework.
  • Configured the spring framework for the entire business logic layer.
  • Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC, Template, Builder and Factory Patterns
  • Used Table per hierarchy inheritance of hibernates and mapped polymorphic associations.
  • Developed one-to-many, many-to-one, one-to-one annotation based mappings in Hibernate.
  • Developed DAO service methods to populate the domain model objects using Hibernate.
  • Used Spring Frame work's Bean Factory for initializing services.
  • Used Java collections API extensively such as List, Sets and Maps.
  • Wrote DAO classes using spring and Hibernate to interact with database for persistence.
  • Used Apache Log4J for logging and debugging.
  • Used Hibernate in data access layer to access and update information in the database.
  • Followed TDD and developed test cases using JUnit for all the modules developed.
  • Used Log4J to capture the log that includes runtime exceptions, monitored error logs and fixed the problems.
  • Created Maven build file to build the application and deployed on Web Sphere Application Server

Environment: Java, J2EE, JSP, JCL, DB2, Struts, SQL, Hibernate, PL/DSQL, Eclipse, Oracle, Windows XP, HTML, CSS, JavaScript, and XML.

Hire Now