We provide IT Staff Augmentation Services!

Sr. Big Data Hadoop Developer Resume



  • Experience 6+ years of strong experience in IT industry with Complete Software Development Life cycle (SDLC) which includes Business Requirements Gathering, System Analysis & Design, Data Modeling, Development, Testing and Implementation of the projects.
  • 4+years of experience in developing, implementing and configuring of Big Data using Hadoop ecosystem tools like HDFS, MapReduce, Hive, Oozie, Sqoop, Zookeeper, RabbitMQ, Kafka, Knox, Ranger, Cassandra, HBase, MongoDB, Spark and Spark Streaming.
  • Experience in installation, configuration, deploying and managing of different Hadoop distributions like Cloudera and Hortonworks Distributions.
  • Configured Oozie, Zookeeper, Spark, Kafka, Cassandra and HBase to the existing Hadoop cluster .
  • Have an experience in importing and exporting data using Sqoop from Hadoop Distributed File Systems to Relational Database Systems and vice versa.
  • Experience in handling various file formats like AVRO, Sequential, text, xml and Parquet etc.
  • Imported the data from source HDFS into Spark RDD for in - memory data computation to generate the output response.
  • Experience on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
  • Expertise in writing SparkRDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
  • Experience on collection the real-time streaming data pipeline from different source using Kafka and store data into HDFS.
  • Extending HIVE core functionality by using custom User Defined Function's (UDF) and User Defined Aggregating Functions (UDAF) for Hive.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in NoSQL Databases HBase, Cassandra and it’s integrated with Hadoop cluster .
  • Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase limitations.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
  • Strong expertise in using ETL Workflow Manager, Repository Manager, Data Quality and ETL concepts.
  • Experience of constant information ingestion utilizing Kafka, Spark and different NoSQL databases.
  • Knowledge in using NiFi to automate the data movement between different Hadoop systems.
  • Implemented Hadoop Security using Knox and Ranger integrated LDAP store with Kerberos KDC.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experience on cloud integration with Amazon Elastic MapReduce, Amazon Cloud Compute, Amazon's Simple Storage Service and Microsoft Azure.
  • Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, Oracle and SQL Server.
  • Experienced in scheduling and monitoring the production jobs using Oozie and Azkaban.


Hadoop Ecosystems: Hadoop, HDFS, MapReduce, Hive, YARN, Oozie, Zookeeper, Spark, Impala, Spark SQL, Spark Streaming, Hue, Kafka, RabbitMQ, Solar, Sqoop, NiFi, Knox, Ranger, and Kerberos .

Cloud Services: Elastic MapReduce, Amazon Cloud Compute, Simple Storage Service and Microsoft Azure.

Languages: Java, Scala, Python, PL/SQL, Unix Shell Scripting.

Java Technologies: Spring MVC, JDBC, JSP, JSON, Applets, Swing, JDBC, JNDI, JSTL, RMI, JMS, Servlets, EJB, JSF.

UI Technologies: HTML5, JavaScript, CSS3, Angular, XML, JSP, JSON AJAX.

Development Tools: Eclipse, IntelliJ, Ant, Maven, Insomnia, Postman’s, Scala IDE.

Frameworks/Web Server: Spring, JSP, Hibernate, Hadoop, Web Logic, Web Sphere, Tomcat.

SQL/ NoSQL Databases: Teradata, PostgreSQL, Oracle, HBase, MongoDB, Cassandra, CouchDB, MySQL and DB2.

Versions/ Building tools: GitHub, BitBucket, SVN, JIRA, Source Tree, Maven


Confidential, TX

Sr. Big Data Hadoop Developer


  • Involved in installing and configuration of Hadoop distribution systems as a Hortonworks Distribution (HDP).
  • Involved in requirement gathered, narrates the stories, reviewing and merging the codes of development team into Dev repositories and follow the agile methodologies of SDLC.
  • Importing of data from various data sources, performed transformations using Hive, loaded data into HDFS and extracted the data from SQL Server into HDFS using Sqoop.
  • Exporting the analyzing data to the relational databases using for visualization and to generate reports for the BI team.
  • Worked on different data format like AVRO, Parquet and XML.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and Map Reduce.
  • Worked on batch and streaming data ingestion to Cassandra database.
  • Expertise in integrating Kafka with Spark streaming for high speed data processing.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frames and RDD's.
  • Good understanding of Machine learning and Data Mining.
  • Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).
  • Involved in integrating testing and production support.
  • Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level design documents from the requirements specification.

Environment : Spark, HDFS, Hive, Hadoop, Oozie, Sqoop, Spark SQL, Kafka, LINUX, HBase, python, Tableau,


Confidential, CA

Sr. Big Data Hadoop Developer


  • Installed and configured distributed data solution using Cloudera Distribution of Hadoop.
  • Involved in complete big data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Imported the data from various formats like JSON, Sequential, AVRO and Parquet to HDFS cluster.
  • Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
  • Configured Hive and written Hive UDF’s and UDAF’s.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
  • Importing and exporting data into HDFS and hive using Sqoop and Kafka.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Experience in managing and reviewing huge Hadoop log files.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
  • Used IntelliJ IDEA to develop the code and dubbing.
  • Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.

Environment : Hadoop, HDFS, Hive, Oozie, Sqoop, Spark, Kafka, Elastic Search, Linux, HBase, Scala, Python, Tableau, MySQL.

Confidential - TX

Big Data/Hadoop Developer


  • Involved in requirement gathered, narrates the stories and worked with complete Software Development Life Cycle (SDLC) methodologies based on Agile.
  • Involved in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks Distribution (HDP) to Cloudera Distributions Hadoop (CDH).
  • Worked on Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Experienced in managing and reviewing Hadoop log files and documenting the issues on daily basis to the resolution portal.
  • Implemented Dynamic Partitions, Buckets in HIVE.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Experience configuring spouts and bolts in various Storm topologies and validating data in the bolts.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Have an experience to l oad and transform large sets of structured, semi structured and unstructured data, using Sqoop from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce Hive, Pig, and Sqoop.
  • Implemented Storm builder topologies to perform cleansing operations before moving data into HBase.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • Used Spark to create API's in Java and Python for Big Data analysis.
  • Experience in troubleshooting errors in Cassandra, Hive and MapReduce.
  • This plugin allows Hadoop MapReduce programs, Cassandra, Pig and Hive to work unmodified and access files directly.
  • Used versions controls tools such as Git Hub, SourceTree, to pull data from Upstream to local branch, check conflict, cleaning also reviewing the codes of other developers.
  • Involved with development teams to discuss JIRA stories and understand the requirements.
  • Actively, involved in complete life cycle of agile methodology to design, develop, deploy and support solutions.

Environment: Hadoop, Hive, Pig, Cassandra, Sqoop, Oozie, Java, MapReduce, Java MySQL, Maven.

Confidential - TX

Hadoop Developer


  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating adhoc reports.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
  • Troubleshooting, manage and review data backups, manage and review Hadoop log files. Worked hands on with ETL process.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
  • Created lookup Hive tables, JSON Format Hive Tables.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Exported the patterns analyzed back into Teradata using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive.

Environment: Java, Oracle, HTML,SQL, J2EE, JUnit, JDBC, Tomcat, SQL, MongoDB, GitHub, SourceTree, NetBeans.

Confidential - TX

Java Developer


  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Requirement gatherings from various stakeholders of the project.
  • Effort-estimation and estimating timelines for development tasks.
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Interact with Client to get the confirmation on the functionalities and implementation.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Provide Support and update for the period under warranty.
  • Produce detailed low-level designs from high level design
  • Specifications for components of low level complexity.
  • Develops, builds and unit tests components of low level
  • Complexity from detailed low-level designs.
  • Developed user and technical documentation.
  • Monitoring of test cases to verify actual results against expected results.
  • Performed Functional, User Interface test and Regression Test.
  • Carrying out Regression testing to track the problem tracking.
  • Implemented Model View Controller (MVC) architecture at the Web tier level to isolate each layer of the application to avoid the complexity of integration and ease of maintenance along with Validation Framework

Environment: Java, JEE, CSS, HTML, SVN, EJB, UNIX, XML, Work Flow, MyEclipse JMS, JIRA, Oracle, JBOSS.

Hire Now