We provide IT Staff Augmentation Services!

Sr. Hadoop/big Data Developer Resume



  • Around 9+ years of programming experience involved in all phases of Software Development Life Cycle (SDLC).
  • Around 5 Years of Big data related architecture experience developing Hadoop applications.
  • Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, Kafka, Storm, Spark, MongoDB, and Cassandra.
  • Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.
  • Data Processing: Processed data using MapReduce and Yarn. Worked on Kafka as a proof of concept for log processing
  • Experienced with performing real time analytics on NoSQL databases like HBase and Cassandra.
  • Good knowledge in working with Impala, Storm and Kafka.
  • Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Experience web services, data modeling, content processing, data replication.
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
  • Familiarity with distributed coordination system Zookeeper.
  • Design and develop test plan for test cases based on functional and design.
  • Analyzed large amounts of data sets using Pig scripts and Hive scripts.
  • Experience in data warehousing with ETL tool Oracle Warehouse Builder (OWB).
  • Hands on experience in working with database like Oracle, MySQL and PL/SQL.
  • Good working knowledge on processing Batch applications.
  • Experience with streaming data using IBM streams processing language.
  • Experience in capturing and analyze the data in motion using info sphere stream.
  • Experienced in writing MapReduce programs and UDFs for both Hive and Pig in Java.
  • Experienced in developing Web Services with Python programming language.
  • Involved in Design and Development of technical specifications using Hadoop Echo System tools
  • Cutting edge experience on Splunk (Log based performance monitoring tool).
  • Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop and Flume.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON a Avro.
  • Experience in using different file formats - Avro, Sequence Files, ORC, JSON and Parquet.
  • Experience in Performance Tuning, Optimization and Customization.
  • Experience with Unit Testing Map Reduce programs using MRUnit, JUnit.
  • Experience in Active Development as well as onsite coordination activities in web-based, client/server and distributed architecture using Java, J2EE which includes Web services, Spring, Struts, Hibernate and JSP/Servlets along with incorporating MVC architecture.
  • Good working knowledge on servers like Tomcat, Web Logic 8.0.
  • Ability to work in teams as well as an individual, quick learner and able to meet deadlines.


Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Splunk, Zookeeper, Oozie, Sqoop, Flume, Impala, Solr, and Ranger.

No SQL: HBase, MongoDB, Couchbase, Neo4j, Cassandra

Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python, Scala

Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine

Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS

Operating system: Windows, Macintosh, Linux and Unix

Frameworks: Springs, MVC, Hibernate, Swings

DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits

Version Control: SVN, CVS and Rational Clear Case Remote Client, GitHub, Visual Studio

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer, WinScp, Tahiti, Cygwin, pentaho


Sr. Hadoop/Big data Developer

Confidential, TEXAS


  • Experience in design and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Impala, Kafka and Spark.
  • Performed Real-time streaming of data using Kafka and processing of data using SparkStreaming and loading of data into Kudu tables..
  • Developed a solid solution for handling offsets for kafka topics in scala. Persist offsets in hbase for minimizing the data loss and reprocessing time.
  • Implemented near real time aggregations by joining on multiple topic with intermediate storage (kudu). The aggregations are running in near real time with (10 mins) interval.
  • Implemented a solution for data archiving process with configurable intervals.
  • Good understanding with kudu partitions, primary keys.
  • Configured workflows for scheduling spark jobs with oozie.
  • Good understanding with integrating multiple ecosystems like spark-hbase and spark-kudu
  • Experience in integrating Kafka with Spark Streaming for real time data processing.
  • Extract streaming complex structured data from Kafka and process the files using Spark Streaming and load the data into Kudu tables.
  • Worked on conversion of Hive/ SQL queries into Spark transformations using Spark RDDs and data frames (DF).
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Configured Zookeeper for coordinating the cluster to maintain data consistency.
  • Involved in loading and transforming of large sets of structured, semi-structured and unstructured data into HDFS.
  • Implemented dynamic partitioning and bucketing in Hive.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Worked on Cluster Monitoring, troubleshooting and Disk topology.
  • Processed different file formats like AVRO, PARQUET, Sequence files and ORC.
  • Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Worked with Data Scientists to design/develop solutions for Data Analysis
  • Designed the ETL process from various sources into Hadoop/HDFS for analysis and further processing.

Confidential, Two-Destiny Way, TEXAS

Sr Hadoop Developer


  • Assisted in the development of high quality analytical reports of weather and GIS data.
  • Used Talend to generate optimized code to load, transform, enrich, and cleanse data inside Hadoop.
  • Moved Relational Database data using Sqoop as ETL tool into the Data Lake,Hive Dynamic partition tables.
  • Build Hadoop clickstream workflows using Apache Hive and Pig for extraction, transformation and loading of data.
  • Imported unstructured data like logs from different web servers to HDFS using Flume and developed MapReduce jobs for log analysis, recommendations and analytics.
  • Involved in real-time data processing using Storm.
  • Expertise in real-time analytics, machine learning and continuous monitoring of operations using Storm.
  • Converted and loaded local data files into HDFS through the UNIX shell.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Worked with NoSQL database Hive, Hbase to create tables and store data.
  • Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems
  • Worked on Apache Nifi to Uncompress and move json files from local to HDFS
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Used Kafka along with HBase to render data streaming.
  • Developed Cassandra data model to match the business requirements
  • Involved in Administration of Cassandra cluster along with Hadoop, Pig and Hive.
  • Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume.
  • Developed spark scripts by using scala shell as per requirements.
  • Created Views from Hive Tables on top of data residing in Data Lake.
  • Build advanced ETL logic on clickstream, log and tax data based on complex technical and business requirements.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • In data exploration stage used hive and impala o get some insights about the customer data.
  • Evaluated Spark's performance vs impala on transactional data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, scala.
  • Involved in configuring batch job to perform ingestion of the source files in to the Data lake
  • Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
  • Worked on Github to check-in and checkout source code.
  • Wrote scripts in Python for extracting data from HTML file.
  • Worked with NoSQL Cassandra to store, retrieve, and update and manage all the details for Ethernet provisioning and customer order tracking.
  • Implemented Spark for fast interactive data analysis of datasets loaded in RDD.
  • Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks
  • Used Pig for aggregation, cleaning and incremental ETL functions and developing UDFs for filtering.
  • Wrote scripts in Python for extracting data from HTML file.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with Cron jobs.
  • Redshift data model and Tableau Server configurations to provide guaranteed response time for reports.
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
  • Migrated an existing on-premises application to AWS.
  • Developed Pig UDFs to specifically preprocess and filter data sets for analysis.
  • Used Teradata database management system to manage the warehousing operations and parallel processing.
  • Validated data sets graphically with Excel and did touch ups in Photoshop.
  • Designed & scheduled workflows for updating system reports using Oozie.
  • Ability to working with team environment and solved problems.

Environment: Cloudera, Avro, HBase, HDFS, Hive, Pig, Java, SQL, Sqoop, Flume, Oozie, Java (jdk 1.7), Eclipse, Splunk, YARN, SQL Server, Spark, python, Hortonworks, Zookeeper, SVN, Talend

Confidential, Dallas, Texas

Hadoop Developer


  • Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and Map Reduce.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
  • Experience with Map Reduce coding.
  • Solved small file problem using Sequence files processing in Map Reduce.
  • Written various Hive and Pig scripts.
  • Experience in Upgrading cluster, CDH and HDP Cluster.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Used flume, sqoop, hadoop,spark and oozie for building data pipeline
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Experience in upgrading Hadoop cluster hbase/zookeeper from CDH3 to CDH4.
  • Performed real time analytics on HBase using Java API and Rest API.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
  • Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
  • Setup flume for different sources to bring the log messages from outside to HDFS.
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Real time experience with analytics and BI.
  • Wrote Python scripts to parse XML documents and load the data in database..
  • Experienced with working on Avro Data files using Avro Serialization system.
  • Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Unit tested and tuned SQLs and ETL Code for better performance.
  • Monitored the performance and identified performance bottlenecks in ETL code. .
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Zookeeper, YARN, Oozie, Java, Eclipse

Confidential, Wilmington, DE

Big Data/Hadoop Developer


  • Created Hive tables and working on them using Hive QL.
  • Involved in installing Hadoop Ecosystem components.
  • Validated Name node, Data node status in a HDFS cluster.
  • Importing and exporting data from HDFS to RDBMS and vice-versa using SQOOP.
  • Experienced in developing HIVE Queries on different data formats like Text file, CSV file.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Installed and configured Hadoop cluster in Test and Production environments
  • Performed both major and minor upgrades to the existing CDH cluster
  • Code review as per the customer coding standards.
  • Testing and providing the valid test data to users as per requirement.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Responsible to manage data coming from different sources.
  • Supporting Hbase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed UDFs for Pig Data Analysis.
  • Involved in managing and reviewing Hadoop log files.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Java, Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Zookeeper, Linux, XML, Eclipse, Cloudera


Java Developer


  • Performed requirements analysis and design and used of Rational Unified Process for analysis, design and documentation purposes.
  • Interacted with the client to understand the requirements thoroughly in a short span of time.
  • Developed Use cases, traceability matrices, design specification and test documents including UATs, training and implementation manuals.
  • Involved in design and development of the architecture of the web applications in JSP.
  • Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from backend Oracle database using JDBC.
  • Involved in developing programs for parsing the XML documents using XML Parser.
  • Involved in tuning and profiling the application for better data transactions and performance used JProbe for the same.
  • Designed and developed Enterprise Stateless and Stateful Session beans to communicate with the Container Managed Entity Bean backend services.
  • Involved in Unit Testing and Integration Testing.
  • Designed and developed various modules of the application with J2EE design architecture and frameworks like Spring MVC architecture and Spring Bean Factory using IOC, AOP concept.
  • Followed agile software development with Scrum methodology.
  • Wrote application front end with HTML, JSP, JSF, Ajax/JQuery, Spring Web Flow and XHTML.
  • Used J Query for UI centric Ajax behavior.
  • Implemented JAVA/J2EE design patterns such as Factory, DAO, Session Façade and Singleton.
  • Used Hibernate in persistence layer and developed POJO's, Data Access Object (DAO) to handle all Database operations.
  • Implemented features like logging, user session validation using Spring-AOP module.
  • Developed server-side services using Java, Spring, Web Services (SOAP, WSDL, JAXB, JAX-RPC)
  • Worked on Oracle as the backend database.
  • Used JMS for messaging.
  • Used Log4j to assign, track, report and audit the issues in the application.
  • Develop and execute Unit Test plans using J Unit, ensuring that results are documented and reviewed with Quality Assurance teams responsible for integrated testing.
  • Worked in deadline driven environment with immediate feature release cycles.

Environment: - Java, Spring, Hibernate, JSP, HTML, CSS, XML, JavaScript, JQuery, JUnit, AJAX, Multi-Threading, Oracle, Web Service - SOAP, WebSphere, MYSQL.


Java Developer


  • Performed in various phases of the Software Development Life Cycle (SDLC)
  • Developed user interfaces using JSP framework with AJAX, Java Script,HTML,XHTML,and
  • CSS
  • Performed the design and development of various modules using CBD Navigator Framework
  • Deployed J2EE applications in Web sphere application server by building and deploying ear fileusing ANT script
  • Created tables, stored procedures in SQL for data manipulation and retrieval.
  • Used technologies like JSP, JavaScript and Tiles for Presentation tier.
  • Performed the design and development of various modules using CBD Navigator Framework
  • Deployed J2EE applications in Web sphere application server by building and deploying ear file using ANT script.
  • Created tables, stored procedures in SQL for data manipulation and retrieval.
  • Used technologies like JSP, JavaScript and Tiles for Presentation tier.
  • CVS tool is used for version control of code and project documents.
  • Application Server, UML, JUnit, JTest, Netbeans, Windows 2000.

Environment: JSP, Servlets, JDK, JDBC, XML, JavaScript, HTML, Spring MVC, JSF, Oracle 8i, Sun Application Server, UML, JUnit, JTest, Netbeans, Windows 2000.

Hire Now