We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Minneapolis, MN


  • 8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
  • Around 5years of experience working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
  • Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
  • Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
  • Skilled programming in Map - Reduce framework and Hadoop ecosystems.
  • Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
  • Experience in implementing Inverted Indexing algorithm using MapReduce.
  • Extensive experience in creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
  • Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
  • Experience in setting up standards and processes for Hadoop based application design and implementation.
  • Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
  • Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Developed and maintained operational best practices for smooth operation of Cassandra/Hadoop clusters.
  • Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experience in coordinating Cluster services through ZooKeeper.
  • Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
  • Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
  • Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
  • Experience using integrated development environment like Eclipse, Net beans, JDeveloper, MyEclipse.
  • Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
  • Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
  • Experience in writing tests using Spec2, Scala Test, Selenium, TestNg and Junit.
  • Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
  • Worked on different OS like UNIX/Linux, Windows XP, and Windows
  • A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
  • Proficient in adapting to the new Work Environment and Technologies.
  • Quick learner and self-motivated team player with excellent interpersonal skills.
  • Well focused and can meet the expected deadlines on target.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.


Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer

Languages: C,C++, Java, Python, Linux shell scripts, SQL

Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2

Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, CSS, AJAX, JSON, Servlets,JSP


Hadoop Developer

Confidential, Minneapolis, MN


  • Developed MapReduce programs to parse the raw data, and create intermediate data which would be further used to be loaded into Hive portioned data.
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processin.
  • Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
  • Implementation of the Business logic layer for MongoDB Services.
  • Executed Hive queries on tables and stored in Hive to perform data analysis to meet the business requirements.
  • Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
  • Worked on Big Data Integration and Analytics based on Hadoop, Spark and Kafka.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Performed real-time analysis of the incoming data using Kafka consumer API , Kafka topics, Spark Streaming utilizing Scala .
  • Used the Datastax Opscenter for maintenance operations and Keyspace and table management.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark
  • Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data.
  • Real time streaming the data using Spark with Kafka.
  • Built real time pipeline for streaming data using Kafka and SparkStreaming.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive .
  • Processing large data sets in parallel across the Hadoop cluster for pre-processing.
  • Developed the code for Importing and exporting data into HDFS using Sqoop .
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Implemented Kafka Custom partitioners to send data to different categorized topics.
  • Implemented Storm topology with Streaming group to perform real time analytical operations.
  • Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
  • Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
  • Written Shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.

Environment: s: Hadoop, Hive, Flume, Linux, Shell Scripting, Java, Eclipse, MongoDB, Kafka, Spark, Zookeeper, Sqoop, Ambari.

Hadoop Developer

Confidential, Austin, TX


  • Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
  • Created Hive Generic UDF's to process business logic with Hive QL.
  • Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
  • Used Cassandra Query Language (CQL) to perform analytics on time series data.
  • Worked on HBase Shell, CQL, HBase API and Cassandra Hector API as part of the above proof of concept.
  • Moving data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
  • Responsible for running Hadoop streaming Jobs to process terabytes of XML Data.
  • Development of Oozie workflow for orchestrating and scheduling the ETL process.
  • Involved in implementation of Avro,ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database
  • Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
  • Experience in implementing Kafka Consumers and Producers by extending Kafka high-level API in java and ingesting data to HDFS or Hbase depending on the context.
  • Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce Jobs.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS)
  • Developed Spark scripts by using Python Shell commands as per the requirement.
  • Experience implementing machine learning techniques in Spark by using Spark Mlib.
  • Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
  • Involved in using Hadoop bench marks in Monitoring, Testing Hadoop cluster.
  • Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
  • Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
  • Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.

Environment: s: Hadoop, AWS, Map Reduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.

Hadoop Developer

Confidential, Miami, FL


  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture Modeling, Development, Implementation, Testing.
  • Responsible to managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Developed algorithms for identifying Influencers with in specified social network channels.
  • Involved in loading and transforming large sets of Structured, Semi structured and Unstructured data from relational databases into HDFS using Sqoop imports.
  • Analyzing data with Hive, Pig and Hadoop Streaming.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Experienced in working with Apache Storm.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Performed Data mining investigations to find new insights related to customers.
  • Involved in forecast based on the present results and insights derived from Data analysis.
  • Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
  • Configured Hadoop Environment with Kerberos authentication, Name nodes, and Data nodes.
  • Designed Sources to Targets mappings from SQLServer, Excel/Flat files to Oracle using Informatica Power Center.
  • Created Data Marts and loaded the data using Informatica Tool.
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, Engagement and traffic to Social media pages.
  • Involved in identification of topics and trends and building context around that brand.
  • Developed different formulas for calculating engagement on social media posts.
  • Involved in the Identifying, Analyzing defects, questionable function error and inconsistencies in output.

Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse

Java Developer

Confidential, Houston, TX


  • Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
  • Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
  • Developed portlet kind of user experience using Ajax, jQuery.
  • Used spring IOC for creating the beans to be injected at the run time.
  • Modified the existing JSP pages using JSTL.
  • Expertise in web designing using HTML5, XHTML, XML, CSS3, JavaScript, jQuery, AJAX and Angular JS.
  • Integrated Spring Dependency Injection among different layers of an application with Spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
  • Designed and developed the web-tier using HTML5, CSS3, JSP, Servlets, Struts and Tiles framework.
  • Used AJAX and JavaScript for validations and integrating business server side components on the client side within the browser.
  • Developed the UI panels using JSF, XHTML, CSS and JQuery.
  • Extensively involved in writing Object relational mapping code using hibernate, developed Hibernate mapping files for configuring Hibernate POJOs for relational mapping.
  • Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
  • Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB
  • Involved in developing business components using EJB Session Beans and persistence using EJB Entity beans.
  • Implemented the Connectivity to the Database Server Using JDBC.
  • Consumed Web Services using Apache CXF framework for getting remote information.
  • Used the Eclipse as IDE, configured and deployed the application into WebLogic application server.
  • Used Maven build scripts to automate the build and deployment process.
  • Used JMS in the project for sending and receiving the messages on the queue.

Environment: Java, J2EE, Spring, Hibernate, HTML5, XHTML, XML, JavaScript, jQuery, AJAX, Angular JS, Oracle SQL, SOAP, REST

Java developer



  • Developed UI using HTML, CSS, Java Script and AJAX.
  • Used Oracle IDE to create web services for EI application using top down approach.
  • Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
  • Created SOAP Handler to enable authentication and audit logging during Web Service calls.
  • Created Service Layer API's and Domain objects using Struts.
  • Designed, developed and configured the applications using Struts Framework.
  • Created Spring DAO classes to call the database through spring -JPA ORM framework.
  • Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
  • Used Exception handling and Multi-threading for the optimum performance of the application.
  • Used the Core Java concepts to implement the Business Logic.
  • Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
  • Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
  • Used Clear Case tool for build management and ANT for Application configuration and Integration.
  • Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)

Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.

Software Developer - Intern



  • Worked as a Development Team Member.
  • Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
  • Identified Use Cases and generated Class, Sequence and State diagrams using UML.
  • Used JMS for the asynchronous exchange of critical business data and events among J2EE components and legacy system.
  • Involved in Designing, coding and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification.
  • Involved in the development of Web Interface using MVC Struts Framework.
  • User Interface was developed using JSP and tags, CSS, HTML and Java Script.
  • Database connection was made using properties files.
  • Used Session Filter for implementing timeout for ideal users.
  • Used stored Procedure to interact with database.
  • Development of Persistence was done using DAO and Hibernate Framework.

Environment: J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Oracle, Tomcat, Eclipse, Web logic 7.0/8.1.

Hire Now