We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Commerce, LA


  • 7 years of professional experience in Design and Development of Java, J2EE and Big Data technologies in depth understanding of Hadoop Distributed Architecture and its various components such as Node Manager, Resource Manager, Name Node, Data Node, Hive Server2, HBase Master, Region Server etc.,
  • Strong experience developing end - to-end data transformations using Spark Core API.
  • Strong experience creating real time data streaming solutions using Spark Streaming and Kafka.
  • Worked extensively on fine tuning spark applications and worked with various memory settings in spark.
  • Strong Knowledge for real time processing using Apache Strom.
  • Developed Simple to complex Map/Reduce jobs using Java.
  • Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce, Spark and Hive.
  • Experience in Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
  • Experience in ETL analytics on ingested data using scripts built with Hive, Pig, Spark, MapReduce that include interactive, batch and real time processing.
  • Expertise in Java/J2EE technologies such as Core Java, spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS and JavaScript.
  • Have Experience of using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Good Experience in writing complex SQL queries with databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server 2005/2008.
  • Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
  • Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.


Big Data: Hadoop, Map Reduce, Pig, Hive, Hbase, Sqoop, Oozie, Cassandra, MongoDB, Horton Works, Kafka, Spark and Zookeeper, Big Query

Web development: HTML, Java Script, XML, PHP, JSP, Servlets, JavaScript

Databases: DB2, MySQL,MS Access, MS SQL server,Teradata, NoSQL, Vertica, Aster nCluster, SSAS, Oracle, Oracle Essbase.

Languages: Java / J2EE, HTML, SQL,Spring, Hibernate, JDBC,JSON, JavaScript

Operating Systems: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP/Vista

Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

Version Control: Git, SVN, Perforce

IDE IDE S: Intellij, Eclipse, NetBeans, JDeveloper


Confidential,Commerce, LA

Hadoop Consultant

  • Cloudera Hadoop installation & configuration of multiple nodes using Cloudera Manager and CDH 4.X/5.X.
  • Prepared low level Design document and estimated efforts for the project.
  • Installed and configured the Kafka multi broker cluster.
  • Customized all the property files of Kafka multi broker servers as per the cluster requirements.
  • Created all the required topics on the brokers with the support of Kafka scripts.
  • Created the partitions and replication factor for the all the topics.
  • Installed and maintained the Kafka multi broker cluster.
  • Started and monitored all the Kafka services to make the Kafka cluster reachable.
  • Maintained the retention period for all the Kafka events.
  • Developed various Kafka producer jobs to publish(produce) the events through topics.
  • Developed consumer jobs to subscribe the data from brokers via topic.
  • Developed encryption and decryption logics for the events.
  • Built Kafka + Spark streaming job that is responsible for reading JSON raw messages from Kafka and transforming the data into summarized data.
  • Developed the UNIX scripting code for loading, filtering and storing the data.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Installed and configured Hadoop Map Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks and within AWS.
  • Experience with NoSQL data modeling with Cassandra/Hbase/MongoDB etc.
  • Involved in loading data from UNIX file system to HDFS.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
  • Running process improvement processes to reduce defects in order to close production issues and improve applications.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
  • Developed and implemented jobs in MR2 Horton Works Cluster and Cluster within AWS.
  • Developed the PIG code for loading, filtering and storing the data.
  • Developed Hive Scripts (HQL) for automating the joins for different sources.
  • Developed various Big Data workflows using Oozie.
  • Big data development with cloud experience Google cloud preferred.
  • Development of MapReduce programs and data migration from existing data source using Sqoop.
  • Developed the custom writable Python programs to load the data into the HBase.
  • Developed Map Reduce Programs using MRv1 and MRv2 (YARN).
  • Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
  • Strong application DBA skills with Data modeling skills for NoSQL and relation databases.
  • Built Java client that is responsible for receiving XML file using REST call and publishing it to Kafka.
  • Built Kafka + Spark streaming job that is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs.
  • Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
  • Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
  • Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X

Environment: Hadoop (HDFS), HBase, Map Reduce, Hive, Spark, Kafka, Oozie, flume, Spark, Cassandra, Horton works, UNIX Shell Scripting, MongoDB, MySQL, Eclipse, Toad, and HP Vertica 6.X/7.X.

Confidential, Chicago, IL

Hadoop/Spark Developer

  • Review Business requirements documents test team to provide insights into the data scenarios and test cases.
  • Analyzing and understanding the Business requirements and Verifying the Business requirement document and Technical design document against requirements.
  • Experience in Extract, Transform, and Load (ETL) Design, development and Testing.
  • Experience working on Spark and Scala .
  • Experience in scheduling the Workflows and monitoring them. Provided Pro-Active Production Support after go-live.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Extracted and processed the data from Legacy systems and stored it on HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
  • Generating user reports using HQL on the data stored on HDFS.
  • Experience in tuning the HQL queries to improve the performance.
  • Experienced in managing and reviewing Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Used Oozie as an automation tool for running the jobs.
  • Experience working on Hadoop and utilities like HDFS, Map Reduce, SQOOP, HIVE, OOZIE, KAFKA, IMPALA, HUE .
  • Experience in Unix scripting.
  • Experience in utilizing Teradata utilities FastLoad, MultiLoad, BTEQ scripting, TPT and FastExport.
  • Identified and performed field level compression on Teradata tables.
  • Experience in testing Data Marts, Data Warehouse/ETL Applications developed in mainframe/Teradata.
  • Experience in loading from various data sources like Teradata, Oracle, Fixed Width and Delimited Flat Files.
  • Involved in Data Extraction from Teradata and Flat Files using SQL assistant.
  • Written several complex SQL queries for validating Reports.
  • Tested several stored procedures.
  • Attending reviews, status meetings and participated in customer interaction.
  • Debugging the SQL-Statements and stored procedures for business scenarios.
  • Performed extensive Data Validation, Data Verification against Data Warehouse.
  • Analyzed the bug reports running SQL queries against the source system(s) to perform root-cause analysis.
  • Created SQL queries to generate ad-hoc reports for the business.
  • Verifying the Business requirement document and Technical design document against requirements.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Created and validated the test data environment for Staging area, loading the Staging area with data from multiple sources.
  • Created data masking rules to mask sensitive data before extracting of test data from various sources and loading of data into tables.
  • Created ETL test data for all transformation rules and covered all the scenarios required for implementing business logic.
  • Developed and tested various stored procedure as part of process automation in Teradata.
  • Tested the ETL process for both before and after data cleansing process.
  • Validating the data passed to downstream systems.
  • Experience in generating following Hadoop performance metrics using Cloudera Manager that portrays the overall cluster health status on weekly basis for senior management at the bank.
  • CPU and Memory Utilization for all edge nodes and Data nodes
  • Disk Space Utilization on all mount points for all the edge nodes
  • Disk Space and Memory Utilization on Name Node
  • Edge Node Disk Utilization by application
  • Job tracker Memory used
  • Average map & reduce task running
  • RPC average processing time Remote procedure calls
  • HDFS Cluster Disk usage by applications
  • Healthy task tracker
  • Block distribution across all PROD data nodes

Environment: s: Teradata, Hadoop, Unix, Spark, Scala, Subversion, Git, Bitbucket, DM Express, Mainframe, MS Visio, MS Office Suite, Quality Centre, MS Outlook, HP Quality Centre.

Confidential,Dallas, TX

Hadoop Developer


  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Worked on debugging, performance tuning and Analyzing data using Hadoop components Hive & Pig.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Developed and implemented jobs in MR2 Horton Works Cluster.
  • Developed and executed Hive, Spark and PIG Queries for de-normalizing the data.
  • Created Hive tables from JSON data using data serialization framework like AVRO.
  • Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Worked on loading data from LINUX file system to HDFS.
  • Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Responsible for processing unstructured data using Pig and Hive.
  • Adding nodes into the clusters & decommission nodes for maintenance.
  • Created PIG script jobs in maintaining minimal query optimization.
  • Worked on various Business Object Reporting functionalities such as Slice and Dice, Master/detail, User Response function and different Formulas.
  • Strong experience on Apache server configuration.

Environment: Hadoop, HDFS, HBase, Pig, Hive, Spark, HortonWorks, Oozie, MapReduce, Sqoop, Cloudera, MongoDB, Cassandra, Kafka, LINUX, Java APIs, Java collection, Windows.

Confidential, Deerfield, IL

Java Developer / Hadoop Developer

  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in using Pig Latin to analyze the large scale data.
  • Involved in loading data from UNIX file system to HDFS.
  • Interacted with business users on regular basis to consolidate and analyze the requirements and presented them with design results.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
  • Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
  • Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
  • Worked on improving the performance by using various performance tuning strategies.
  • Managed the evaluation of ETL and OLAP tools and recommended the most suitable solutions depending on business needs.
  • Migrated jobs from development to test and production environments.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Used Shell Scripts for loading, unloading, validating and records auditing purposes.
  • Used Teradata Aster bulk load feature to bulk load flat files to Aster.
  • Shell Scripts are also used for file validating, records auditing purposes.
  • Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster database.
  • Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.

Environment: Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.


Software Developer

  • Coding of the presentation layer usingJava, JSP, Spring Framework, and JavaScrip t.
  • Developed components forBusiness Layerusing Java Technologies.
  • Implemented Validation Framework usingAJAX DW R.
  • Designing Class diagrams, Sequence diagrams, and activity diagrams.
  • CodecoverageandbranchcoverageoftheapplicationwithJUnittestscriptsforunit testing.
  • Fixing Check style Errors, CPD, PMD defects.
  • Support the Integration Testing phase with fixes for the defects.
  • Implementedloggingfacilityintheincentivesapplicationtoincreasetheperformanceand productivity
  • Maintain the incentives application build, deployment, integration, system test.
  • Developing validation framework Using Ajax framework.

Environment: Java, Oracle, DB2, spring, Tomcat, Servlets, JSP, Java Script


Software Developer

  • The System is designed using J2EE technologies based on the MVC architecture.
  • The application uses the STRUTS framework. The views are programmed using JSP pages with the Struts tag library, Modelisa combination of EJB's and Java classes (Form and Action classes) and Controllers are Action Servlets.
  • Form level validations are provided using struts validation framework.
  • Used JSP's and Action Servlets for server side transactions.
  • JSP are used to communicate with EJB's. EJB as a middleware in designing and developing a three-tier application
  • The processed data is transferred to the database through persistent bean (cmp).

Environment: JSF, Struts, Hibernate, Tomcat 5.x (development), Bea Weblogic 8.1(production and deployment)

Hire Now