We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Houston, TX


  • Above 9+ working experience as a Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn and MapReduce programming paradigm.
  • Experienced on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge of MapReduce/HDFS Framework.
  • Hands-on programming experience in various technologies like Java, J2EE, Html, XML
  • Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
  • Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR, etc) to fully implement and leverage new Hadoop features
  • Really strong working knowledge of front end technologies including Java script framework and Angular.js
  • Strong experienced in working with Unix/Linux environments, writing shell scripts.
  • Excellent knowledge and working experience in Agile & Waterfall methodologies.
  • Expertise in Web pages development using JSP, Html, Java Script, JQuery and Ajax.
  • Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL.
  • Extensively used ANT tool in building common components, automation scripts, and code instrumentation scripts
  • Proficiency in developing MVC patterns based web applications using Struts by creating forms using Struts tiles and validates using Struts validation framework
  • Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.
  • Experience in deploying applications in various Application servers like Apache Tomcat, and Web Sphere.
  • Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
  • Architected, Designed and Developed Big Data solutions for various implementations.
  • Experience in Amazon AWS cloud which includes services like: EC2, S3, EBS, ELB, AMI Route53, Auto scaling, Cloud Front, Cloud Watch, Security Groups.
  • A very good experience in developing and deploying the applications using Web logic, Apache Tomcat, and JBoss.
  • Worked on HDFS, Name Node, Job Tracker, Data Node, Task Tracker and the Map-Reduce concepts.
  • Experience in working with Developer Toolkits like Force.com IDE, Force.com Ant Migration Tool, Eclipse IDE, Mavens.
  • Experience in Front-end Technologies like Html, CSS, Html5, CSS3, and Ajax.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Apache Storm.
  • Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Excellent hands on with importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive and vice-versa, using Sqoop.
  • Install Kafka on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Experience with web-based UI development using JQuery, Ext JS, CSS, Html, Html5, XHTML and Java script
  • Experience with various version control systems Clear Case, CVS, SVN


Hadoop Ecosystem: Hadoop 3.0, HDFS, MapReduce, Hive 2.3, Impala 2.10, Apache Pig 0.17, Sqoop 1.4, Oozie 4.3, Zookeeper 3.4, Flume 1.8, Kafka 1.0.1, Spark, Sql, Spark streaming, AWS, Azure Data lake, NoSQL.

Application Server: Web sphere, Weblogic, JBoss, Apache Tomcat

Databases: HBase 1.2, Cassandra 3.11, MongoDB 3.6, MySQL 8.0, Sql Server2016, Oracle 12c

IDE: Eclipse, NetBeans, MySQL Workbench.

Agile Tools: Jira, Jenkins, Scrum

Build Management Tools: Maven, Apache Ant

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit

Version control: Github, Jenkins

Methodology: RAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.


Confidential, Houston, TX

Sr. Big Data/Hadoop Developer


  • Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
  • Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
  • Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Involved in the complete SDLC, Daily Scrum (Agile) including design of System Architecture, development of System Use Cases based on the functional requirements.
  • Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the Hive queries decreased the time of execution from hours to minutes.
  • Defined the application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create reference architecture for the enterprise.
  • Responsible for creating an instance on Amazon EC2 (AWS) and deployed the application on it.
  • Exposure on Spark Architecture and how RDD’s work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
  • Involved in data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Created custom UDF’s for Spark and Kafka procedure for some of non-working functionalities in custom UDF into Scala in production environment.
  • Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real time analysis.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Ingested all formats of structured and unstructured data including Logs/Transactions, Relational Databases using Sqoop & Flume into HDFS.
  • Implemented Flume custom interceptors to perform cleansing operations before moving data into HDFS.

Environment: AWS S3, EMR, Python, pyspark, Scala, Hadoop, MapReduce, Hive, impala, Sqoop, Spark SQL, Spark Stream, Airflow, Jenkins, GIT, Bitbucket, R Language and tableau.

Confidential, Columbus, OH

Big Data/Hadoop Developer


  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop and worked with highly unstructured and semi structured data of 2 Petabytes in size.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Worked on custom Talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Creating the cube in Talend to create different types of aggregation in the data and also to visualize them.

Environment: Hive, Teradata, MapReduce, HDFS, Sqoop, AWS, Hadoop, Pig, Python, Kafka, Apache Storm, SQL scripts, data pipeline, HBase, JSON, Oozie, ETL, Zookeeper, Maven, Jenkins, RDBMS

Confidential, Tampa, FL

Sr. Hadoop Developer


  • Written Hive queries for data analysis to meet the business requirements
  • Load and transform large sets of structured, semi structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Migrated data between RDBMS and HDFS/Hive with Sqoop.
  • Used Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Used Sqoop to import and export data among HDFS, MySQL database and Hive
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Involved in loading data from UNIX/LINUX file system to HDFS.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Worked on implementing Spark Framework a Java based Web Frame work.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Worked on Spark Code using Scala and Spark SQL for faster data sets processing and testing.
  • Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Managed and reviewed Hadoop and HBase log files. Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Performed data analysis with HBase using Hive External tables.
  • Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
  • Imported the data from relational database to Hadoop cluster by using Sqoop.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Responsible for building scalable distributed data solutions using Hadoop. Create tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory loading the data from local system to HDFS.
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Loaded Data into HBase using Bulk Load and Non-bulk load.
  • Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
  • Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to Elastic search.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Worked in Agile development environment having KANBAN methodology. Actively involved in daily scrum and other design related meetings.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Continuous coordination with QA team, production support team and deployment team.

Environment: MapReduce, PIG Latin, Hive, Apache Crunch, Spark, Scala, HDFS, HBase, Core Java, J2EE, Eclipse, Sqoop, Impala, Flume, Oozie, MongoDB, Jenkins, Agile Scrum methodology

Confidential, Brentwood, TN

Sr. Java/Hadoop Developer


  • Developed the business solution to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Actively participated in every stage of Software Development Lifecycle (SDLC) of the project.
  • Designed and developed user interface using JSP, Html and JavaScript for better user experience.
  • Exported analyzed data to downstream systems using Sqoop-RDBMS for generating end-user reports, Business Analysis reports and payment reports.
  • Participated in developing different UML diagrams such as Class diagrams, Use case diagrams and Sequence diagrams.
  • Involved in developing UI (User Interface) using Html, CSS, JSP, JQuery, Ajax, and Java Script.
  • Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation.
  • Imported and exported data jobs to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Data integration into destination which is received from various providers using Sqoop onto HDFS for analysis and data processing.
  • Managed clustering environment using Hadoop platform.
  • Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
  • Managed data using the ingestion tool Kafka.
  • Wrote and implemented Apache PIG scripts to load data from and to store data into Hive.
  • Assisted admin for extending and setting up the nodes on to the cluster.
  • Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN.
  • Wrote Hive UDFS to extract data from staging tables and analyzed the web log data using the Hive QL.
  • Used multi-threading concepts and clustering concepts for data processing.
  • Managed the clustering and designing of debug the issue if exits any.
  • Involved in creating Hive tables, load data and writing hive queries, which runs map reduce in backend and further Partitioning and Bucketing was done when required.
  • Used Zookeeper for various types of centralized configurations.
  • Tested the data coming from the source before processing and resolved problem faced.
  • Pushed and commit the sample codes on to the Github.
  • Ingested the raw data, populated staging tables, and stored the refined data.
  • Developed programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
  • Created Hive queries for the market analysts to analyze the emerging data and comparing it with fresh data in reference tables.
  • Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Tested raw data and executed performance scripts.
  • Shared responsibility with administration of Hive and Pig.
  • Worked in Apache Tomcat for deploying and testing the application.
  • Worked with different file formats like Text files, Sequence Files, Avro.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Used Automation tools like Maven.
  • Used spring framework to provide the RESTFUL services.
  • Provided design recommendations and thought leadership to stakeholders that improved review processes and resolved technical problems.

Environment: Java, Eclipse, Hadoop, Hive, HBase, Linux, Map Reduce, Pig, HDFS, Oozie, Shell Scripting, MySQL.

Confidential, Mount Laurel, NJ

Sr. Java/J2EE Developer


  • Involved in the complete Software Development Life Cycle (SDLC) including Requirement Analysis, Design, Implementation, Testing and Maintenance.
  • Worked on designing and developing the Web Application User Interface and implemented its related functionality in Java/J2EE for the product.
  • Used JSF framework to implement MVC design pattern.
  • Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
  • Wrote JSF managed beans, converters and validators following framework standards and used explicit and implicit navigations for page navigations.
  • Designed and developed Persistence layer components using Hibernate ORM tool.
  • Designed UI using JSF tags, Apache Tomahawk & Rich faces.
  • Used Oracle 10g as backend to store and fetch data.
  • Experienced in using IDEs like Eclipse and Net Beans, integration with Maven
  • Created Real-time Reporting systems and dashboards using XML, MySQL, and Perl
  • Worked on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology)
  • Involved in detailed analysis based on the requirement documents.
  • Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL and Ant.
  • Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
  • Used NodeJS for server side rendering. Implemented modules into NodeJS to integrate with designs and requirements.
  • Used JAX-WS to interact in front-end module with backend module as they are running in two different servers.
  • Responsible for Offshore deliverables and provide design/technical help to the team and review to meet the quality and time lines.
  • Migrated existing Struts application to Spring MVC framework.
  • Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
  • Extensively used LDAP Microsoft Active Directory for user authentication while login.
  • Developed unit test cases using JUnit.
  • Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
  • Involved in developing Perl script and some other scripts like java script
  • Tomcat is the web server used to deploy OMS web application.
  • Used SOAP Lite module to communicate with different web-services based on given WSDL.
  • Prepared technical reports &documentation manuals during the program development.
  • Tested the application functionality with JUnit Test Cases.

Environment: JDK 1.5, JSF, Hibernate 3.6, JIRA, NodeJS, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/Unix.


Java Developer


  • Involved in prototyping, proof of concept, design, Interface Implementation, testing and maintenance.
  • Created use case diagrams, sequence diagrams, and preliminary class diagrams for the system using UML/Rational Rose.
  • Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Developed reusable utility classes in core java for validation which are used across all modules.
  • Actively designed, developed and integrated the Metrics module with all other components.
  • Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering, Design Analysis and Code development.
  • Implemented Struts framework based on the Model View Controller design paradigm.
  • Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlets as a Controller.
  • Used JNDI to perform lookup services for the various components of the system.
  • Involved in designing and developing dynamic web pages using HTML and JSP with Struts tag libraries.
  • Used HQL (Hibernate Query Language) to query the Database System and used JDBC Thin Driver to connect to the database.
  • Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed Web Services by using SOAP UI.
  • Used JPA to persistently store large amount of data into database.
  • Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
  • Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
  • Used JPA for the management of relational data in application.
  • Designed and developed business components using Session and Entity Beans in EJB.
  • Developed the EJBs (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.
  • Extensively used AJAX technology to add interactivity to the web pages.
  • Developed JMS Sender and Receivers for the loose coupling between the other modules and Implemented asynchronous request processing using Message Driven Bean.
  • Used JDBC for data access from Oracle tables.
  • JUnit was used to implement test cases for beans.
  • Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
  • Involved in deployment of application on Weblogic Application Server in Development & QA environment.
  • Used Log4j for External Configuration Files and debugging.

Environment: JSP 1.2, Servlets, Struts1.2.x, JMS, EJB 2.1, Java, OOPS, Spring, Hibernate, JavaScript, Ajax, Html, CSS, JDBC, JMS, Eclipse, WebSphere, DB2, JPA, ANT.

We'd love your feedback!