We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

4.00/5 (Submit Your Rating)

Chicago, IL


  • Result Oriented Professional building on 8+ years of progressive experience in Software Development includes application design and development along wif 3+ years in Big Data/ Hadoop experience in Hadoop ecosystem such as HDFS, MapReduce, Hive, Pig, Flume, Sqoop, Zookeeper, HBase, and Spark.
  • Big data development experience wif Google cloud.
  • Experience in working wif various Hadoop distributions - Cloudera and HortonWorks.
  • Experience in migrating the data using Sqoop from Hadoop to Relational Database System and vice-versa.
  • Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
  • Experience wif leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
  • Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
  • Experience in NoSQL database MongoDB and Cassandra.
  • Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
  • Experience in installation, configuration, supporting and managing- CloudEra's Hadoop platformalong wif CDH3&4 &5 clusters.
  • Familiarity on real time streaming data wif Spark and Kafka.
  • Experience in ETL analytics on ingested data using scripts built wif Hive, Pig, Spark, MapReduce dat include interactive, batch and real time processing.
  • Expertise in Java/J2EE technologies such as Core Java, spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS and JavaScript.
  • Have Experience of using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Good Experience in writing complex SQL queries wif databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server 2005/2008.
  • Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
  • Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning.
  • Ability to blend technical expertise wif strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.


Big Data: Hadoop, Map Reduce, Pig, Hive, Hbase, Sqoop, Oozie, Cassandra, MongoDB, Horton Works, Kafka, Spark and Zookeeper, Big Query

Web development: HTML, Java Script, XML, PHP, JSP, Servlets, JavaScript

Databases: DB2, MySQL,MS Access, MS SQL server,Teradata, NoSQL, Vertica, Aster nCluster, SSAS, Oracle, Oracle Essbase.

Languages: Java / J2EE, HTML, SQL,Spring, Hibernate, JDBC,JSON, JavaScript

Operating Systems: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP/Vista

Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

Version Control: Git, SVN, Perforce

IDE’S: Intellij, Eclipse, NetBeans, JDeveloper


Confidential - Chicago, IL

Hadoop Consultant


  • Cloudera Hadoop installation & configuration of multiple nodes using Cloudera Manager and CDH 4.X/5.X.
  • Prepared low level Design document and estimated efforts for the project.
  • Developed the UNIX scripting code for loading, filtering and storing the data.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Installed and configured Hadoop Map Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks and wifin AWS.
  • Experience wif NoSQL data modeling wif Cassandra/Hbase/MongoDB etc.
  • Involved in loading data from UNIX file system to HDFS.
  • Performed data analysis in Hive by creating tables, loading it wif data and writing hive queries which will run internally in a MapReduce way.
  • Running process improvement processes to reduce defects in order to close production issues and improve applications.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
  • Developed and implemented jobs in MR2 Horton Works Cluster and Cluster wifin AWS.
  • Developed the PIG code for loading, filtering and storing the data.
  • Developed Hive Scripts (HQL) for automating the joins for different sources.
  • Developed various Big Data workflows using Oozie.
  • Big data development wif cloud experience Google cloud preferred.
  • Development of MapReduce programs and data migration from existing data source using Sqoop.
  • Developed the custom writable Python programs to load the data into the HBase.
  • Developed Map Reduce Programs using MRv1 and MRv2 (YARN).
  • Developed Spark SQL jobs dat read data from Data Lake using Hive transform and save it in Hbase.
  • Strong application DBA skills wif Data modeling skills for NoSQL and relation databases.
  • Built Java client dat is responsible for receiving XML file using REST call and publishing it to Kafka.
  • Built Kafka + Spark streaming job dat is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs.
  • Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
  • Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
  • Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X

Environment: Hadoop (HDFS), HBase, Map Reduce, Hive, Spark, Kafka, Oozie, flume, Spark, Cassandra, Horton works, UNIX Shell Scripting, MongoDB, MySQL, Eclipse, Toad, and HP Vertica 6.X/7.X.

Confidential - Hartford

Hadoop/Spark Developer


  • Review Business requirements documents test team to provide insights into the data scenarios and test cases.
  • Analyzing and understanding the Business requirements and Verifying the Business requirement document and Technical design document against requirements.
  • Experience in Extract, Transform, and Load (ETL) Design, development and Testing.
  • Experience working on Spark and Scala.
  • Experience in scheduling the Workflows and monitoring them. Provided Pro-Active Production Support after go-live.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Extracted and processed the data from Legacy systems and stored it on HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
  • Generating user reports using HQL on the data stored on HDFS.
  • Experience in tuning the HQL queries to improve the performance.
  • Experienced in managing and reviewing Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Used Oozie as an automation tool for running the jobs.
  • Experience working on Hadoop and utilities like HDFS, Map Reduce, SQOOP, HIVE, OOZIE, KAFKA, IMPALA, HUE.
  • Experience in Unix scripting.
  • Experience in utilizing Teradata utilities FastLoad, MultiLoad, BTEQ scripting, TPT and FastExport.
  • Identified and performed field level compression on Teradata tables.
  • Experience in testing Data Marts, Data Warehouse/ETL Applications developed in mainframe/Teradata.
  • Experience in loading from various data sources like Teradata, Oracle, Fixed Width and Delimited Flat Files.
  • Involved in Data Extraction from Teradata and Flat Files using SQL assistant.
  • Written several complex SQL queries for validating Reports.
  • Tested several stored procedures.
  • Attending reviews, status meetings and participated in customer interaction.
  • Debugging the SQL-Statements and stored procedures for business scenarios.
  • Performed extensive Data Validation, Data Verification against Data Warehouse.
  • Analyzed the bug reports running SQL queries against the source system(s) to perform root-cause analysis.
  • Created SQL queries to generate ad-hoc reports for the business.
  • Verifying the Business requirement document and Technical design document against requirements.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Created and validated the test data environment for Staging area, loading the Staging area wif data from multiple sources.
  • Created data masking rules to mask sensitive data before extracting of test data from various sources and loading of data into tables.
  • Created ETL test data for all transformation rules and covered all the scenarios required for implementing business logic.
  • Developed and tested various stored procedure as part of process automation in Teradata.
  • Tested the ETL process for both before and after data cleansing process.
  • Validating the data passed to downstream systems.
  • Experience in generating following Hadoop performance metrics using Cloudera Manager dat portrays the overall cluster health status on weekly basis for senior management Confidential the bank.
  • CPU and Memory Utilization for all edge nodes and Data nodes
  • Disk Space Utilization on all mount points for all the edge nodes
  • Disk Space and Memory Utilization on Name Node
  • Edge Node Disk Utilization by application
  • Job tracker Memory used
  • Average map & reduce task running
  • RPC average processing time Remote procedure calls
  • HDFS Cluster Disk usage by applications
  • Healthy task tracker
  • Block distribution across all PROD data nodes

Environment: Teradata, Hadoop, Unix, Spark, Scala, Subversion, Git, Bitbucket, DM Express, Mainframe, MS Visio, MS Office Suite, Quality Centre, MS Outlook, HP Quality Centre.

Confidential - Dallas, TX

Hadoop Consultant


  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Worked on debugging, performance tuning and Analyzing data using Hadoop components Hive & Pig.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Developed and implemented jobs in MR2 Horton Works Cluster.
  • Developed and executed Hive, Spark and PIG Queries for de-normalizing the data.
  • Created Hive tables from JSON data using data serialization framework like AVRO.
  • Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Worked on loading data from LINUX file system to HDFS.
  • Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Responsible for processing unstructured data using Pig and Hive.
  • Adding nodes into the clusters & decommission nodes for maintenance.
  • Created PIG script jobs in maintaining minimal query optimization.
  • Worked on various Business Object Reporting functionalities such as Slice and Dice, Master/detail, User Response function and different Formulas.
  • Strong experience on Apache server configuration.

Environment: Hadoop, HDFS, HBase, Pig, Hive, Spark, HortonWorks, Oozie, MapReduce, Sqoop, Cloudera, MongoDB, Cassandra, Kafka, LINUX, Java APIs, Java collection, Windows.

Confidential, Seattle, WA

Hadoop Admin/Developer


  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in using Pig Latin to analyze the large scale data.
  • Involved in loading data from UNIX file system to HDFS.
  • Interacted wif business users on regular basis to consolidate and analyze the requirements and presented them wif design results.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
  • Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
  • Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
  • Worked on improving the performance by using various performance tuning strategies.
  • Managed the evaluation of ETL and OLAP tools and recommended the most suitable solutions depending on business needs.
  • Migrated jobs from development to test and production environments.
  • Created external tables wif proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Used Shell Scripts for loading, unloading, validating and records auditing purposes.
  • Used Teradata Aster bulk load feature to bulk load flat files to Aster.
  • Shell Scripts are also used for file validating, records auditing purposes.
  • Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster database.
  • Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.

Environment: Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.


Java Developer / Hadoop Developer


  • Involved in Requirements analysis, design, and development and testing.
  • Involved in developing of Group portal and Member portal applications.
  • Developed front end using Struts and JSP.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
  • Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
  • Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
  • Developed webpages using HTML, Java script, JQuery and CSS.
  • Developed customized reports and Unit Testing using JUNIT.
  • Used Java 1.6, spring, Hibernate, Oracle, to build the product suite.
  • Responsible for building projects in deployable files (WAR files and JAR files).
  • Coded Java Servlets to control and maintain the session state and handle user requests.
  • Involved in development, and Testing, phases of the project by following agile methodology.
  • Implemented the logging mechanism using log4j framework.
  • Developed Web Services.
  • Verified software errors and interacted wif developers to resolve the technical issues.
  • Used Maven to build the J2EE application.
  • Wrote complex SQL queries and stored procedures.
  • Involved in maintenance of different applications.

Environment: Servlet, Enterprise Javabeans, Custom Tags, Stored Procedures, JavaScript, Java, Spring Framework, Struts, Web Services, Oracle.


Java Developer


  • Involved in the designing of the project using UML.
  • Followed J2EE Specifications in the project.
  • Designed the user interface pages in JSP.
  • Used XML and XSL for mapping the fields in database.
  • Used JavaScript for client side validations.
  • Created stored procedures and triggers dat are required for project.
  • Created functions and views in Oracle.
  • Enhanced the performance of the whole application using the stored procedures and prepared statements.
  • Responsible for updating database tables and designing SQL queries using PL/SQL.
  • Created bean classes for communicating wif database.
  • Involved in documentation of the module and project.
  • Prepared test cases and test scenarios as per business requirements.
  • Involved in bug fixing.
  • Prepared coded applications for unit testing using JUnit.

Environment: Java, JSP, Servlets, J2EE, EJB 3, Java Beans, Oracle, HTML, DHTML, XML, XSL, JavaScript, BEA WebLogic.

We'd love your feedback!