We provide IT Staff Augmentation Services!

Big Data / Hadoop Lead Resume

Dallas, TX


  • Over all 8+ years of experience in data analysis, data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming, Data warehousing and Advanced Analytics
  • 4 years of experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Hive, Hive,Oozie, Kafka, Impala & Spark, AVRO, JSON).
  • Good knowledge of Hive optimization with ORC, Partitions and Bucketing.
  • Data ingestion schedulers have been created using Sqoop and Oozie scheduler.
  • Have hands on experience in writing MapReduce jobs using Java.
  • Hands on experience in writing pig Latin scripts and pig commands and hive queries.
  • Having good knowledge and experience in Spark and Kafka.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Scala,Hive, Impala & Spark
  • Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, Informix, and SQL Server.
  • Having handful experience on AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)
  • Experience working on NoSQL databases including HBase & MongoDB.
  • Experience using Sqoopto import data into HDFS from RDBMS and vice - versa.
  • Experience in Database Design and Development using Relational Databases (Oracle, MS-SQL, MySQL Server 2005/2008) and NoSQL Databases (MongoDB, Cassandra, HBase)
  • Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
  • Having good work experience in file formats such as AVRO, JSON, and Parquet etc. with Hadoop tools using SerDe concepts
  • Experience in analyzing data in Spark using Scala and Pyspark.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
  • Experience in importing streaming data into HDFS using flume sources and flume sinks and transforming the data using flume interceptors
  • Experience utilizing Java tools in Business, Web, and Client-Server environments including Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and Sql.
  • Proficient in using various IDEs like Eclipse, Net beans.
  • Experienced with different scripting languages like Python and shell scripting.


Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Spark, Scala, Impala, Hive, Pig, Oozie,sqoop, Flume, Kafka, CDH4, JSON, AVRO

Java Technologies: Java 5,Java 6, JAXP, AJAX, I18N, JFC Swing, Log4j, Java Help API

Methodologies: Agile, UML, Design Patterns

Database: Oracle 10g, DB2,MySQL, No Sql (MongoDB),Hbase, Cassandra

Cloud: AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)

Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0

Web Tools: HTML, Java Script, XML, DTD, Schemas, XSL, XSLT, XPath, DOM, XQuery

Tools: SQL developer, DB visualize, Hortonworks

IDE / Testing Tools: NetBeans, Eclipse, WSAD, RAD, Mat lab

Operating System: Windows. Linux

Scripts: Bash, Python, ANT

Testing API: JUNIT


Confidential, Dallas, TX

Big Data / Hadoop Lead


  • Involved in the project from POC and worked from data staging till saturation of DataMart and reporting.
  • Worked in an onsite-offshore environment.
  • Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
  • Involved in discussions and guiding other region teams on Citi Big data platform and AML cards data model and strategy.
  • Responsible for technical design and review of data dictionary (Business requirement).
  • Responsible for providing technical solutions and work arounds.
  • Migrating the needed data from Data warehouse and Product processors into HDFS using Talend and Sqoop and importing various formats of flat files in to HDFS.
  • Analysis and development of Spark Cassandra connector to load data from flat file to Cassandra.
  • Using Spark Streaming to bring all credit card transactions in the Hadoop environment.
  • Involved in design of overall Citi Group Big data architecture.
  • Involved in discussion with source systems for issues related to DQ in data.
  • Integrated the hive warehouse with Spark Impala. We replaced impala with spark due to impala’s security issue.
  • Comfortable with SCALA functional programming idioms and very familiar with Iterate / Enumerate streaming patterns. Almost entire DQ and end to end reconciliation is done in SCALA & SPARK.
  • Implemented partitioning, dynamic partitions, indexing and buckets HIVE.
  • Created Custom UDF’s in JAVA to overcome HIVE limitations on cloudera CDH5.
  • Used Hive to process data and Batch data filtering. Used Spark/Impala for any other value centric data filtering.
  • Supported and Monitored Map Reduce Programs running on the cluster.
  • Monitored logs and responded accordingly to any warning or failure conditions.
  • Responsible for preserving code and design integrity using SVN and SharePoint.
  • Gave a demo to business users on using Datameer for analytics.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Talend, Spark, Cassandra, Impala, Scala, Sqoop, Cloudera CDH5, Platform, SVN, SharePoint, Data Meer and Maven.

Confidential, New York city, NY

Big Data / Hadoop Developer


  • Experienced developing in templates and screens in HTML and JavaScript.
  • Used struts validation framework for form level validation.
  • Wrote test cases in JUnit for unit testing of classes.
  • Worked on Spring to develop different modules to assist the product in handling different requirements.
  • Implemented CDH3 Hadoop cluster on CentOS.
  • Implemented POC's to configure data tax Cassandra with Hadoop.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring launched instances with respect to specific applications.
  • Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Experienced with Performing Cassandra Queryoperations using Thrift API to perform real time analytics.
  • Cluster coordination services through Zookeeper.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Working knowledge in writing Pig's Load and Store functions.

Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, Pig, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.

Confidential, Somerset, NJ

Java/ J2EE Developer


  • Coded the business methods according to the IBM Rational Rose UML model.
  • Extensively used Core Java, Servlets, JSP and XML.
  • Used Struts 1.2 in presentation tier.
  • Generated the Hibernate XML and Java Mappings for the schemas
  • Used DB2 Database to store the system data
  • Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
  • Used unit testing for all the components using JUnit.
  • Used Apache log 4j Logging framework for logging of trace and Auditing.
  • Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
  • Used IBM Web-Sphere as the Application Server.
  • Used IBM Rational Clearcase as the version controller.

Environment: s: Java 1.6, Servlets, JSP, Struts1.2, IBM Rational Application Developer (RAD) 6, Web sphere 6.0, iText, AJAX, Rational Clear case, Rational Rose, Oracle 9i, log4j.


JAVA Developer


  • Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
  • Developed the modules based on struts MVC Architecture.
  • Developed The UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
  • Created Business Logic using Servlets, Session beans and deployed them on WebLogic server.
  • Used MVC struts framework for application design.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Prepared the Functional, Design and Test case specifications.
  • Involved in writing Stored Procedures in Oracle to do some database side validations.
  • Performed unit testing, system testing and integration testing
  • Developed Unit Test Cases. Used JUnit for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
  • Used Eclipse IDE for all coding in Java, Servlets and JSPs.
  • Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing, responsible for defects allocation and ensuring that the defects are resolved.
  • Used Flex Styles and CSS to manage the Look and Feel of the application.
  • Deployed the application on Web Sphere Application server.

Environment: Java 6, Eclipse, Apache Tomcat Web Server, JSP, JavaScript, AWT, Servlets, JDBC, HTML, Front Page 2000, Oracle, CVS.

Hire Now