We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Houston, TX


  • Over 8+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects
  • Very strong experience in processing, analyzing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
  • Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig
  • Expertise in creating Hive Internal/External Tables/Views using shared Meta store
  • Developed custom UDFs in Pig and Hive to extend their core functionality
  • Hands on experience in transferring incoming data from various application servers into HDFS, Hive, HBase using Apache Flume
  • Have experience of working on Snow - flake and Vertica data warehouse
  • Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa
  • Proficient in big data ingestion and streaming tools like Sqoop, Kafka and Spark
  • Experience of working on data formats like Avro, Parquet
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement
  • Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm
  • Worked on Oozie to manage and schedule the jobs on Hadoop cluster
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications
  • Knowledge of developing analytical components using Scala
  • Experience in managing and reviewing Hadoop log files
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop
  • Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system
  • Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
  • Proficient in using data visualization tools like Tableau, Raw and MS Excel
  • Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS
  • Implemented functions, stored procedures, triggers using PL/SQL
  • Good understanding of ETL processes and Data warehousing
  • Strong experience in writing UNIX shell scripts
  • Working in different projects provided exposure and good understanding of different phases in SDLC
  • Deploying the code via Jenkins
  • Merging the code to master after validating to BitBucket (similar to GIT)


Hadoop/Big Data:: Hadoop 1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R

Development Tools: Eclipse, IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware

Programming/Scripting Languages:: Java, C++, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL

Databases:: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL& DB2

NoSQL Databases: HBase, Cassandra, Mongo DB

ETL: Informatics

Visualization:: Tableau, Raw and MS Excel

Frameworks:: Hibernate, JSF 2.0, Spring

Version Control Tools:: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case

Methodologies:: Agile/ Scrum, Waterfall

Operating Systems:: Windows, Unix, Linux and Solaris


Confidential, Houston, TX

Hadoop Developer


  • Experienced in development using Cloudera distribution system
  • As a Hadoop Developer, my responsibility is manage the data pipelines and data lake
  • Have experience of working on Snow - flake data warehouse
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Designed custom Spark REPL application to handle similar datasets
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
  • Performed Hive test queries on local sample files and HDFS files
  • Used AWS services like EC2 and S3 for small data sets
  • Developed the application on Eclipse IDE
  • Developed Hive queries to analyze data and generate results
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
  • Used Scala to write code for all Spark use cases
  • Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
  • Assigned name to each of the columns using case class option in Scala
  • Developed multiple Spark Sql jobs for data cleaning
  • Created Hive tables and worked on them using Hive QL
  • Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS
  • Developed Spark SQL to load tables into HDFS to run select queries on top
  • Developed analytical component using Scala, Spark and Spark Stream
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports

Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP

Confidential, Charlotte, NC

Hadoop Developer


  • Importing data using Sqoop into HDFS vice versa.
  • Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop System.
  • Responsible to manage data coming from different data sources.
  • Load data from various data sources into HDFS using Flume.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Created HIVE tables and provided analytical queries for business user analysis
  • Extensive knowledge on PIG scripts using bags and tuples.
  • Created tables in HIVE by partitioning and bucketing for granularity and optimization of HIVEQL.
  • Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Pig, Hive and written Pig and Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map way.

Environment: Cloudera, HBase, Java, Hive, Pig, Sqoop, Oozie, Oracle, SVN, Kafka, GitHub, JIRA, Talend.


Hadoop Developer


  • Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Developed Map Reduce jobs in Java for data cleaning and preprocessing.
  • Good knowledge in using Apache NIFI to automate the data movement.
  • Used Map Reduce to ingest customer behavioral data and financial histories into HDFS.
  • Used Pig as ETL tool for transforming and pre-aggregations before storing data into HDFS.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
  • Handled importing of data from various data sources, performed transformations.
  • Involved in creating tables, partitioning, bucketing of table.
  • Configured Flume agents on different data sources to capture the streaming log data.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop cluster in virtual servers in EC2 and S3.
  • Experience with different data formats like Avro, Parquet, ORC and compressions like Snappy and Z-zip.
  • Implemented POC in persisting click stream data with Apache Kafka.
  • Optimized existing algorithms in Hadoop using Spark SQL.
  • Troubleshooting and solving migration issues and production issues.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, AWS, Horton works, Kafka, Cassandra, UNIX, Tableau.


Java Developer


  • Extensively involved in different stages of Agile Development Cycle including Detailed Analysis, Design, Develop and Test.
  • Implemented the Back-End Business Logic using Core Java technologies including Collections, Generics, Exception Handling, Java Reflection and Java I/O.
  • Wrote and specified Spring Annotation Configuration to define Beans and View Resolutions to configure Spring beans, dependencies and the services needed by beans.
  • Used Spring IC to implement dynamic dependency injection and Spring AOP to implement crosscutting concerns such as transaction management.
  • Wrote Mapping Configuration files to implement ORM Mappings in the Persistence Layer.
  • Using Hibernate DAO support extended Dao Implementation.
  • Hibernate Configuration files were written to connect Oracle database and fetch data.
  • The Hibernate Query Cache was implemented using EhCache to improve the performance.
  • Implemented web services with RESTful standards with the support of JAX-RS APIs.
  • Confirmation of registration and monthly statements are sent to users by integrating and implementing JavaMail API.
  • Manipulated database data with SQL queries, including setting up stored procedures and triggers.
  • Implemented front-end developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery and AJAX.
  • Used jQuery libraries to simplify the frontend programming works. Performed users' input validation using JavaScript and jQuery.
  • Utilized Node.js and MongoDB to generate tendency charts of the application for Payment History.
  • Performed JUnit test cases to test the service layers of the application.
  • Used JIRA to track the projects and GIT to ensure version control.

Environment: Java, Spring, JavaMail, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, MongoDB, GIT.

Hire Now