We provide IT Staff Augmentation Services!

Lead Consultant - Hadoop/big Data Resume

Charlotte, NC


  • Over 12 years of experience in Information Technology involving Analysis, Design and Implementation. Excellent skills in state - of-the-art technology of client server computing, distributed and scalable application development.
  • 5 years of work experience on Big Data Analytics with hands on experience on writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.
  • Expert in understanding the data and designing and implementing the enterprise platforms like Hadoop Data Lake and Huge Data warehouses.
  • Good working experience on Hadoop architecture, Confidential, Map Reduce and other components in the Cloudera - Hadoop ecosystem. Experience in various Hadoop distributions (Cloudera, Hortonworks) & cloud platforms (AWS Cloud, Microsoft Azure).
  • Experience with complex data structures that include nested objects multiple levels deep.
  • Good working experience on Hadoop architecture and various components such as Confidential, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop, Map Reduce, Confidential, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Used Apache Kafka for tracking data ingestion to Hadoop cluster. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Developed User Defined Functions (UDFs) for Apache Pig and Hive using Python and Java languages.
  • Familiar with Spark, Kafka, Storm, Talend, and Elastic search.
  • Knowledge of NoSQL databases such as HBase, MongoDB &Cassandra.
  • Experience in developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extending the default functionality by writing User Defined Functions (UDFs) for data specific processing.
  • Experience in job scheduling and monitoring through Oozie and Zookeeper.
  • Knowledge in Data warehousing and using ETL tools like Informatica and Pentaho.
  • Experience in migrating data to and from RDBMS and unstructured sources into Confidential using Sqoop & Flume.
  • Wrote business logic code in C# code behind files to read data from database stored procedure.
  • Extremely good in Struts, Spring Framework, Hibernate.
  • Strong technical background on C#.NET, Windows Azure (Cloud), Windows Service, Entity Framework, LINQ, Windows Service, SQL server.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
  • Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
  • Experience in working with Cloudera, Hortonworks, and Microsoft Azure HDINSIGHT Distributions.
  • Proficient in Data-Structures, Design Patterns in C++. Solid experience in building multi-threaded applications in C++, Python.
  • Experience in implementing OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
  • Used various Ajax/JavaScript framework tools like Java Script, jQuery, JSON.
  • Good Understanding of Design Patterns like MVC, Singleton, Session Facade, DAO, Factory.
  • Strong experience in software development using Java/J2EE technologies.
  • Expertise in back-end/server side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC), Java Naming and Directory Interface (JNDI).
  • Expertise in J2EE and MVC architecture/implementation, Web Services, SOA, Analysis, Design, Object modeling, Data modeling, Integration, Validation, Implementation and Deployment.
  • Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
  • Rich experience in Agile Methodologies such as extreme programming (XP), Scrum and Test Driven Development TDD.
  • Expert level skills in designing and implementing web server solutions and deploying java application servers like Tomcat, JBoss, WebSphere, WebLogic on Windows& UNIX platform.
  • Knowledge in Spark APIs to cleanse, explores, aggregate, transform, and store data.
  • Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures.
  • Strengths include good team player, excellent communication interpersonal and analytical skills, flexible to work with new technologies and ability to work effectively in a fast-paced, high volume, deadline-driven environment.


Language: Java, PL/SQL, Python, C#, Unix Shell Scripting and Scala.

Hadoop/Big Data: Apache Hadoop, Yarn, Confidential, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Flume, Cassandra, Zookeeper, Spark, SparkSQL,Microsoft Azure

File formats: Compression, Sequence file, AVRO, ORC and Parquet.

Java Technologies: Java EE, Servlets, JSP, JUNIT, AJAX

NoSQL Databases: HBase, Cassandra and MongoDB

Web Technologies: JavaScript, HTML, XML, CSS3.

Frameworks: MVC, Hibernate, Spring Framework

IDE'S: Eclipse (GALILEO, HELIOS, INDIGO, Mars), MyEclipse, Net beans.


Web and Application Servers: Web Logic, JBOSS, Web Sphere, Apache Tomcat

Build Automation: Ant, Maven

RDBMS: Oracle, DB2, MySQL, SQL Server

Operating Systems: Linux (Red Hat, Ubuntu, Fedora), MAC OSx, Windows


Confidential, Charlotte, NC

Lead Consultant - Hadoop/Big Data


  • Good at working on Hadoop, MapReduce, and Yarn/MRv2. Developed multiple MapReduce jobs for structured, semi-structured and unstructured data in java.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experience in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in Map-Reduce.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in Flume to ingest data from multiple sources.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
  • Consumed JSON messages using Kafka and processed the JSON file using Spark Streaming to capture UI updates
  • Developed spark programming code in SCALA on INTELLIJ IDE using SBT tools.
  • Performance tuning of SQOOP, Hive and Spark jobs.
  • Worked with .Net and C# to create dash board according to the client requirements.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Wrote AZURE POWERSHELL scripts to copy or move data from local file system to Confidential Blob storage
  • Importing and exporting data into Confidential and Hive using Sqoop.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
  • Experienced in analyzing data with Hive and Pig.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in Confidential .
  • Developed Pig Latin scripts to extract the data from the web server output files to load into Confidential .
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Expertise in designing, data modeling for Cassandra NoSQL database.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
  • Experienced in implementing High Availability using QJM and NFS to avoid single point of failure.
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka.
  • Developed custom mappers in python script and Hive UDFs and UDAFs based on the given requirement.
  • Connects to a NFSv3 storage server supporting AUTH NONE or AUTH SYS authentication method.
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in querying data using SparkSQL on top of Spark engine.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Implemented analytical platform that used Hive Functions and different kind of join operations like Map joins, Bucketed Map joins.

Environment: CDH, Java(JDK1.7), Hadoop, Azure, MapReduce, Confidential, Hive, Sqoop, Flume, NFS, Cassandra, Pig, Oozie, Kerberos, Scala, SparkSQL, Spark Streaming, Kafka, Linux, AWS, Shell Scripting, MySQL Oracle 11g, SQL*PLUS, C++, C#

Confidential, Ann Arbor, MI

Big Data/Hadoop Developer


  • Involved in Cluster Setup, monitoring and administration tasks like commission and decommission nodes
  • Worked with business partners to gather business requirements.
  • Installed Name node, Secondary name node, (Resource Manager, Node manager, Application master), Data node using Cloudera.
  • Worked extensively in creating MapReduce jobs using to power data for search and aggregation.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Extensively used Pig for data cleansing.
  • Work with the Teradata analysis team using BigData technologies to gather the business requirements.
  • Worked as a Hadoop developer to analyze large amounts of data to analyze regulatory reports by creating MapReduce jobs in Java.
  • Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
  • Provided Hadoop, OS, Hardware optimizations.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Understanding the performance bottlenecks by analyzing the existing hadoop cluster and provided performance tuning accordingly.
  • Performed configuration and troubleshooting of services like NFS, NIS, NIS+, DHCP, FTP, LDAP, Apache Web servers.
  • Experience in implementing applications on Spark frameworks using Scala.
  • Developed spark programming code in SCALA on INTELLIJ IDE using SBT tools.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Installed and configured Hadoop components Confidential, Hive, HBase.
  • Communicating with the development teams and attending daily meetings.
  • Addressing and Troubleshooting issues on a daily basis.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users and setting up Kerberos principals.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
  • Involved in implementing dashboard using .Net and C#.
  • Developed highly efficient algorithms in C++ through both pair-programming and independent work.
  • Evaluation and troubleshooting of different NoSQL database systems and cluster configurations to ensure high-availability in various crash scenarios.
  • Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented open source monitoring tool GANGLIA for monitoring the various services across the cluster.
  • Dumped the data from Confidential to MYSQL database and vice-versa using SQOOP.
  • Provided the necessary support to the ETL team when required.
  • Integrated Nagios in the Hadoop cluster for alerts.
  • Performed both major and minor upgrades to the existing cluster and also rolling back to the previous version.

Environment: Linux, Confidential, MapReduce, Hive, Pig, Azure, kDC, NAGIOS, GANGLIA, Oozie, Sqoop, Cloudera Manager.

Confidential, Carol Stream, IL

Java/Hadoop Developer


  • Involved in all phases of Agile Scrum Process like Stand up, Retrospective, Sprint Planning meetings.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Developed JSP pages for presentation layer (UI) using Struts with client side validations using Struts Validator framework/ JavaScript.
  • Used JMS in the project for sending and receiving the messages on the queue.
  • Developed the UI panels using JSF, XHTML, CSS, DOJO and JQuery.
  • Used AJAX and JavaScript for validations and integrating business server side components on the client side with in the browser.
  • Establish coding standards for Java, JEE, ExtJS, etc.
  • Wrote JavaScript functions to get Dynamic data and Client side validation.
  • Created Oracle database tables, stored procedures, sequences, triggers, views
  • Developed the CRUD API for the POSEngine using Restful Webservices.
  • Involved in the development of SQL, PL/SQL Packages, Stored Procedures
  • Implemented the Connectivity to the Data Base Server Using JDBC.
  • Consumed Web Services using Apache CXF framework for getting remote information
  • Developed Rest architecture based web services to facilitate communication between client and servers.
  • Experienced on loading and transform
  • Used C# to design a dashboard for the client to view the reports and other data. Extracting of large sets of structured, semi structured and unstructured data from HBase through Sqoop and placed in Confidential for further processing.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Created webservices using WTP tool plugin to the eclipse IDE which is deployed as a separate application using Maven scripts.
  • Performed general SharePoint IDE/Clearcase/Clearquest administration.
  • Manage multiple, high profile cross-functional AGILE program teams across various business units.
  • Identified Requirements done the design and development of use cases using UML.
  • Responsible for developing GUI / user interfaces using JSP, CSS & DHTML.
  • Designed and developed the web-tier using Html, JSP's, Servlets, Struts and Tiles framework.
  • Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.
  • Implemented a prototype to integrate PDF documents into a web application using iText PDF library
  • Designed and developed client and server components of an administrative console for a business process engine framework using Java, Google Web Toolkit and Spring technologies.
  • Designer and Architect of SOA Governance (Oracle enterprise repository) and Wiki plug-in development for O2 UK Repository search engine and SOA Shop for Services.


Confidential, Chicago, IL

Mainframes Developer


  • Responsible for requirement gathering and analysis through interaction with end users.
  • Worked in production support as well as development and enhancements.
  • Involved in business requirement gathering, analysis, feasibility research and Technical Requirement Specifications preparation.
  • Involved in estimation and Preparation of high level and Detailed Level Design documents.
  • Involved in all project life cycle stages of D10 Rewrite project.
  • Coded batch and online programs in COBOL, JCL, VSAM and DB2 for extraction of data to produce results according to business requirements.
  • Conducted Design and Code reviews, code walkthroughs, unit testing, regression testing, acceptance testing, integration testing and system testing.

Environment: IBM Z/OS,COBOL, DB2, JCL, VSAM, Java, Unix, ISPF, QMF, SPUFI, File Aid, File Aid for DB2,Crystal Reports, Power Builder, MQ series.

Hire Now