We provide IT Staff Augmentation Services!

Hadoop (big Data) Developer Resume

Richmond, VA


  • Over 7 years of industrial experience in Application development and maintenance, data management, programming, data analysis and data visualization.
  • Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Oozie, Mahout, Python, Spark, Storm, Cassandra, MongoDB, Big Data and Big Data Analytics.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondry Namenode, and MapReduce concepts.
  • Experienced managing No - SQL DB on large Hadoop distribution Systems such as: Cloudera, Hortonworks HDP, MapR M series etc.
  • Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
  • Experienced in building analytics for structured and unstructured data and managing large data ingestion using technologies like Kafka/Avro/Thift.
  • Software development in Java Application Development, Client/Server Applications, Internet/Intranet based database applications and developing, testing and implementing application environment using C++, J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases.
  • Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
  • Excellent interpersonal skills and the ability to work as a part of a team.
  • Experience in debugging, troubleshooting production systems, profiling and identifying performance bottlenecks.
  • Has good knowledge of virtualization and worked on VMware Virtual Center.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and troubleshooting Hadoop related issues.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience in managing Hadoop clusters using Cloudera Manager.
  • Experience in using the Impala usage for the high performance SQL queries.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Performed data analysis using MySQL, SQL Server Management Studio and Oracle.
  • Expertise in creating Conceptual Data Models, Process/Data Flow Diagram, Use Case Diagrams and State Diagrams.
  • Experience with cloud computing platforms like Amazon Web Services (AWS).
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.


Hadoop ECO Systems: HDFS, Map Reducing, HDFS, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Cassandra

NO SQL: HBase, Cassandra, MongoDB

Data Bases: MS SQL Server 2000/2005/2008/2012, MY SQL, Oracle 9i/10g

Languages: Languages Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, PL/SQL.

Operating Systems: Windows Server 2000/2003/2008, Windows XP/Vista, Mac OS, UNIX, LINUX

Java Technologies: Servlets, JavaBeans, JDBC, JNDI

Frame Works: JUnit and JTest

IDE s & Utilities: Eclipse, Maven, NetBeans.

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, QueryAnalyser, Profiler, Export & Import (DTS).

WebDev. Technologies: ASP.NET, HTML,XML


Confidential - Richmond, VA

Hadoop (Big Data) Developer


  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
  • Developed Spark code using Java and Spark-SQL/Streaming for faster processing of data
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources and consolidate it to a JSON File.
  • Wrote customs UDF’s for HIVE to pull the customized data.
  • Experience on loading and transforming of large sets of structured, semi structured.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, Data Frames.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Wrote Shell scripts to load data from Salesforce to Hadoop Raw region and the validated Region.
  • Extract the data from the Hadoop Validated Region and created Output JSON files in the refined region for the AML team Consumption.
  • Experience in writing complex SQL to optimize the hive queries.
  • Analyzed the data, to join the multiple sources based on the Primary keys.
  • Implemented Daily Cron jobs that automates the jobs once the upstream jobs are run using the CNTL-M setup.
  • Analyzed the result of the project thoroughly with unit testing and blackbox testing
  • Designed the data pipeline from sources to Hadoop.
  • Prepared the mapping document, as in which fields has be used from the HIVE DB and perform the transformations.
  • Involved in loading data from UNIX file system to HDFS, AWS S3.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Handled importing of data from various data sources like AWS S3, MongoDB performed transformations using Hive, MapReduce, Spark and loaded data into HDFS
  • Production support was given throughout the entire project.
  • Developed UNIX shell scripts to send a mail notification upon the job completing either with a success or Failure notation.
  • Experienced in running Hadoop streaming jobs to process terabytes data from AWS S3
  • Developed RabbitMQ messaging system to create the cases in Salesforce using Salesforce API.
  • Prepared documentation for the Audit work.
  • Wrote the test scenarios for the regressive testing of the functionality.
  • Effectiveness testing of the customers from the source output database DB.
  • Used JIRA as a bug-reporting tool for updating the bug report.

Environment: Hadoop, MapReduce, HDFS, Hive,Spark, UNIX Shell Scripting,Eclipse

Confidential, Woonsocket, RI

Hadoop Developer/Admin


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed and configured various components of Hadoop ecosystem like JobTracker, TaskTracker, Name Node, and Secondary Name Node.
  • Configured a cluster by editing config files such as core-site.xml, mapred-site.xml, hdfs-site.xml and masters/slaves.
  • Installed and configured Flume, Hive, Pig, Sqoop, and HBase on the Hadoop Cluster.
  • Responsible to monitor block Scanner Reports on data nodes.
  • Clustering/Classification of delivery documents in Hadoop.
  • Implemented 30 nodes CDH3/CDH4 Hadoop cluster on CentOS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs.
  • Involved in loading data from UNIX file system to HDFS.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Installed and configured Hive and also written Hive UDFs.
  • Experience in managing and reviewing Hadoop log files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Designed and implemented Mapreduce based large-scale parallel relation-learning system.
  • Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, Pig, Sqoop, Storm, VPN, MapReduce, CentOS

Confidential, Atlanta, GA

Hadoop Developer


  • Involved in review of functional and non-functional requirements.
  • Facilitated knowledge transfer sessions.
  • Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Got good experience with NOSQL database.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system
  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Setup and benchmarked Hadoop/HBase clusters for internal use

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, UNIX Shell Scripting.

Confidential, Oklahoma City, OK

Java Developer


  • Developed Map Reduce programs in Java for parsing the raw data and populating staging
  • Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming
  • Used Eclipse 6.0 as IDE for application development.
  • Involved in writing test cases by using set of conditions to test the application
  • Configured Struts framework to implement MVC design patterns
  • Build sql queries for fetching the required columns and data from database.
  • Used Subversion as the version control system
  • Managed the SVN related responsibilities and maintained the versions accordingly.
  • Done SVN check in and check out’s.
  • Used Hibernate for handling database transactions and persisting objects
  • Used AJAX for interactive user operations and client side validations
  • Developed ANT script for compiling and deployment
  • Performed unit testing using Junit
  • Extensively used Log4j for logging the log files

Environment: Java/J2EE, SQL, PL/SQL, JSP, EJB, Struts, SVN, JDBC, XML, XSLT, UML, JUnit


System Engineer


  • Involved in Requirement Analysis, Development and Documentation.
  • Participation in developing form-beans and action mappings required for struts implementation and validation framework using struts.
  • Development of front-end screens with JSP Using Eclipse.
  • Involved in Development of Medical Records module.
  • XML and XSDs are used to define data formats.
  • Involved in Bug fixing and functionality enhancements.
  • Designed and developed excellent Logging Mechanism for each order process using Log4J.
  • Involved in writing Oracle SQL Queries.
  • Involved in Check-in and Checkout process using CVS.
  • Developed additional functionality in the software as per business requirements.
  • Involved in requirement analysis and complete development of client side code.
  • Followed Sun standard coding and documentation standards.
  • Participation in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
  • Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4J, Web logic 7.0, JDBC, MyEclipse, Windows, XP, CVS, Oracle.


Java Developer


  • Designed and developed application using Java; Developed SQL queries and stored procedures for the application.
  • Analyzed System Requirements and prepared System Design document.
  • Developed dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology.
  • Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
  • Designed an ER Diagram for all the databases using the DB Designer an Open Source Tool.
  • Designed the Class Diagrams and the use case Diagram using the Open Source tool.
  • Created and executed Test Plans using Quality Center by Test Director.
  • Developed database schema and SQL queries for querying database on Oracle 9i.
  • Reviewed and edited data forms using Microsoft Excel.
  • Interacted and communicated with Key stakeholders to understand business problems and define the analytical approach to resolve problems.
  • Involved in all facets of application development from system design, implementation, maintenance, support, testing and proficient in documentation.
  • Helped other team members in the project if they are facing any technical issues in application integration and configuration side.

Environment: Java, UNIX Shell Scripting, Eclipse

Hire Now