We provide IT Staff Augmentation Services!

Hadoop Developer Resume



  • Over 7+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects
  • Very strong experience in processing, analyzing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
  • Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig
  • Expertise in creating Hive Internal/External Tables/Views using shared Meta store
  • Developed custom UDFs in Pig and Hive to extend their core functionality
  • Hands on experience in transferring incoming data from various application servers into HDFS, Hive, HBase using Apache Flume
  • Stored Data in Vertica EDW
  • Have experience of working on Snow - flake and Vertica data warehouse.
  • Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa
  • Performed Data Ingestion from multiple disparate sources and systems using Kafka
  • Proficient in big data ingestion and streaming tools like Apache Flume, Sqoop, Kafka, Storm and Spark.
  • Experience of working on data formats like Avro, Parquet
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement
  • Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement
  • Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm.
  • Worked on Oozie to manage and schedule the jobs on Hadoop cluster
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications
  • Knowledge of developing analytical components using Scala
  • Experience in managing and reviewing Hadoop log files
  • Worked with NoSQL database HBase to create tables and store data
  • Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system
  • Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
  • Proficient in using data visualization tools like Tableau, Raw and MS Excel
  • Developed applications using Java, RDBMS and UNIX Shell scripting
  • Experience of working on Servlets, JSP, JSF, Spring, Hibernate, JPA and JDBC
  • Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS
  • Implemented functions, stored procedures, triggers using PL/SQL
  • Good understanding of ETL processes and Data warehousing
  • Strong experience in writing UNIX shell scripts
  • Working in different projects provided exposure and good understanding of different phases in SDLC


Hadoop/Big Data: Hadoop 1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R

Development Tools: Eclipse, IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware

Programming/Scripting Languages: Java, C++, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL

Databases: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL& DB2

NoSQL Databases: HBase, Cassandra, Mongo DB

ETL: Informatics

Visualization: Tableau, Raw and MS Excel

Frameworks: Hibernate, JSF 2.0, Spring

Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case

Methodologies: Agile/ Scrum, Waterfall

Operating Systems: Windows, Unix, Linux and Solaris


Confidential, Ashburn

Hadoop Developer


  • Using Sqoop to import and export data from Oracle and DB2 into HDFS so as to use it for the analysis
  • Migrated Existing MapReduce programs to Spark Models using Python.
  • Migrating the data from Data Lake (hive) into s3 Bucket.
  • Done data validation between data present in data lake and s3 bucket.
  • Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
  • Have experience of working on Snow - flake data warehouse.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Designed custom Spark REPL application to handle similar datasets
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
  • Performed Hive test queries on local sample files and HDFS files
  • Used Kafka for real time data ingestion.
  • Created different topic for reading the data in Kafka
  • Read data from different topics in Kafka.
  • Moved data from s3 bucket to snowflake data warehouse for generating the reports.
  • Written Hive queries for data analysis to meet the business requirements
  • Migrated an existing on-premises application to AWS.
  • Created Hive tables and worked on them using Hive QL
  • Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS
  • Extensively used Pig for data cleaning and optimization
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Used AWS services like EC2 and S3 for small data sets.
  • Developed the application on IntelliJ IDE
  • Create data Frames using Scala.
  • Developed Hive queries to analyze data and generate results
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Used Scala to write code for all Spark use cases.
  • Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
  • Assigned name to each of the columns using case class option in Scala.
  • Developed multiple Spark Sql jobs for data cleaning
  • Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark Sql.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning and bucketing in hive, doing map side joins etc.
  • Good knowledge on Spark platform parameters like memory, cores and executors
  • By using Zookeeper implementation in the cluster, provided concurrent access for hive tables with shared and exclusive locking
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Worked on the NoSQL databases HBase and mongo DB.

Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP

Confidential, Plano

Spark Developer


  • Experienced in development using Cloudera distribution system.
  • As a Hadoop Developer, my responsibility is manage the data pipelines and data lake.
  • Performing Hadoop ETL using hive on data at different stages of pipeline.
  • Worked in an agile technology with Scrum
  • Sqooped data from different source systems and automating them with oozie workflows.
  • Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs.
  • Automation of Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners.
  • Developed Spark scala code to cleanse and perform ETL on the data in data pipeline in different stages.
  • Worked in different environments like DEV, QA, Data Lake and Analytics Cluster as part of Hadoop Development.
  • Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
  • Developed pig scripts, python to perform Streaming and created tables on the top of it using hive.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
  • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
  • Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
  • Supported Map Reduce Programs those are running on the cluster.
  • Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Good Understanding of Workflow management process and in implementation.

Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera

Confidential, Chicago

Hadoop Developer


  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
  • Developed MapReduce programs to parse the raw data and store the refined data in tables
  • Designed and modified database tables and used HBase queries to insert and fetch data from tables
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS that were further used for analysis
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications
  • Created Hive tables, loaded data and wrote Hive queries that run within the map
  • Used Oozie operational services for batch processing and scheduling workflows dynamically
  • Developed and updated social media analytics dashboards on regular basis
  • Performed data mining investigations to find new insights related to customers
  • Managed and read viewed Hadoop log files
  • Used Vertica as Enterprise data warehouse.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages
  • Involved in identification of topics and trends and building context around that brand
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies observed in the output

Environment: HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume 1.3, Oozie, Zookeeper, MySQL, and Eclipse


Java Developer


  • Used XML for ORM mapping relations with the java classes and the database
  • Worked in Analysis, Design and Coding for client development using J2EE stack using Eclipse platform
  • Involved in creating web-based java components like client Applets and client side UI using JFC in Eclipse
  • Developed PL/SQL stored procedures to perform complex database operations
  • Used Struts in presentation tier
  • Used Subversion as the version control system
  • Played key role in the design and development of application using J2EE, Struts, Spring
  • Involved in various phases of Software Development Life Cycle
  • Configured Struts framework to implement MVC design patterns
  • Designed and developed GUI using JSP, HTML, DHTML and CSS
  • Generated the Hibernate XML and Java Mappings for the schemas
  • Used Rational Application Developer (RAD) as Integrated Development Environment (IDE)
  • Extensively used Core Java, Servlets, JSP and XML
  • Used Oracle WebLogic workshop to generate the web service artifacts from the given WSDL for JAX-WS specification

Environment: Java, Struts, Servlets, spring, Tomcat, Hibernate, HTML, JSP, XML, SQL, J2EE, Junit, Oracle 11g, Windows


Java Developer


  • Implemented server side programs by using Servlets and JSP
  • Designed, developed and validated User Interface using HTML, Java Script, XML and CSS
  • Implemented MVC using Struts Framework
  • Implemented Controller Servlet to handle the access to database
  • Participated in code walkthroughs, Debugging and defect fixing
  • Involved in the co-ordination of end to end production release process
  • Used SVN for versioning control system
  • Used JDBC prepared statements to call from Servlets for database access
  • Designed and documented the stored procedures
  • Involved in writing JUnit Test Cases and done unit testing for various components
  • Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures
  • Used Spring Framework for Dependency Injection and integrated with Hibernate
  • Used Log4J for any errors in the application

Environment: Java, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Struts, Eclipse, WebLogic, PL/SQL and Oracle

Hire Now