We provide IT Staff Augmentation Services!

Data Analyst / Hadoop Developer / Cassandra Developer Resume

Virginia Beach, VA


I have 8 years' of experience as BIG DATA ANALYST/HADOOP DEVELOEPR/CASSANDRA DEVELOPER with designing, developing, deploying and supporting Java based software applications including 3 years' of experience in data analysis. I also have 3 years' of experience in Big - Data Analytics and technologies like Cassandra Confidential bracstart;DataStax Confidential bracend;,Apache Hadoop framework(HDFS/MapReduce), PIG, HIVE, HBASE, and coordinating their usability using customized Java API and application management using OOZIE. I am knowledgeable in Scala and SPARK and also various machines learning algorithm like Apache MAHOUT.


  • Experienced with data modeling, Hadoop MapReduce Architecture and distributed System.
  • Used Cloudera CDH4manager for ad-hoc queries for development purpose.
  • Designed and implemented client specific data-migration plan to transfer data to HDFS and RDBMS like MYSQL and SQL server using SQOOP and FLUME for very large amount of data. Implemented PARTITIONING and BUCKETING schemes for quick data access to HIVE.
  • Developed data-analysis implementations in PIG and HIVE. Executed workflows using OOZIE.
  • Created PIG and HIVE UDFs to implement the functionalities not available in PIG. Also used Piggybank user defined function Repository.
  • Developed Java based MapReduce codes, UDFs, User Defined Aggregate Functions (UDAFs). Deployed JAR files to implement custom functions otherwise unavailable in PIG and HIVE.
  • Identified, designed, and developed statistical data analysis routines to process data for extract of "Business intelligence Confidential quot; insights. Implemented Apache MAHOUT based machine learning algorithm as needed.
  • Knowledgeable in SPARK and Scala. Framework exploration for transition from Hadoop/MapReduce to SPARK.
  • Installed and configured Cassandra. In depth knowledge about Cassandra architecture, query, read and write path.
  • Knowledgeable in Installation, configuration and monitoring of Hadoop cluster and also performance tuning of the cluster.
  • Designed, Developed and tested core Java applications as part of software development life cycle.
  • Significant knowledge in J2EE including JSP, servlets, EJB, JMS, Spring and Hibernate framework for building client-server application.
  • Experienced working on Linux and UNIX (CentOS, Ubuntu) also worked with C++, SQL, and HTML.
  • Team player with good communication and presentation skills, great interpersonal skill and excellent work- ethic.


SETADVANCED Big Data Technologies: Cassandra, SPARK, Apache MAHOUT, Cloudera CDH4, Impala, Knowledgeable in Apache Solr

Big Data Technologies: Apache Hadoop (HDFS/MapReduce), PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE

Web Technologies: JSP, JDBC, HTML, JavaScript

Languages: Java, Scala (for Spark), J2EE, SQL, Unix, C++

Machine Learning / Data Analysis / Statistics: Hidden Markov Model, Random Forest, Decision Tree, Support Vector Machine, Neural Network.

Operating Systems: Windows, UNIX and Linux

Frame Works: Spring, Hibernate

Version Control: VSS (Visual Source Safe), CVS

Testing Technologies: JUnit


Confidential, Virginia Beach, VA


Roles & Responsibilities:

  • Monitored the Hadoop cluster using Cloudera CDH4.
  • Collected the business requirement from the subject matter experts like data scientists and business partners.
  • Developed Java and PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis. Designed appropriate partitioning/bucketing schema to allow faster data during analysis using HIVE.
  • Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
  • Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
  • Worked with Spark on parallel computing to enhance knowledge about RDD in DataStax Cassandra.
  • Worked with Scala to determine the flexibility of Scala on Spark and Cassandra to the management.
  • Implemented “Hadoop Streaming Confidential rdquo; to extend usability of existent analysis procedures written in different languages.
  • Analyzed the customer Confidential rsquo;s clickstream data and Web logs. Loaded data to HBase using Pig script, and analyzed using Hive Script.
  • Used Apache MAHOUT clustering techniques to identify customers who may need to go for screening for any particular disease from different locations and to find the relation between their account activity and the service they need to be provided.
  • Used OOZIE work flow scheduler to schedule different MapReduce job.
  • Involved in managing and reviewing Hadoop log file.
  • Transferred the analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.

Tools & Technologies: Cassandra, Hadoop, Cloudera CDH4, Hive, HBase MAHOUT, SPARK, Statistical/Clickstream analysis.

  Confidential, Harrisburg, PA


Roles & Responsibilities:

  • Observed the Set up and monitoring of a scalable distributed system based on HDFS for better idea and worked closely with the team to understand the business requirement and add new support features.
  • Gathered business requirement to determine the feasibility and to convert them to technical tasks in the design document.
  • Installed and configured Hadoop MapReduce jobs, HDFS and developed multiple MapReduce jobs in java and used different UDF's for data cleaning and processing.
  • Involved in loading data from LINUX file system to HDFS.
  • Used PIG Latin and PIG scripts to process data.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Extracted the data from various SQL servers into HDFS using SQOOP. Developed custom MapReduce codes, generated JAR files for user defined functions and integrated it with HIVE to extend the accessibility of statistical procedures within the entire analysis team.
  • Implemented Partitioning, Dynamic partitioning and Bucketing in HIVE using internal and external table for more efficient data.
  • Used HIVE queries for aggregating the data and mining information sorted by volume and grouped by vendor and product.
  • Performed statistical data analysis routines using Java API's to analyze data using.

Tools & Technologies: Hadoop (HDFS/MapReduce), PIG, HIVE, SQOOP, SQL, Linux, Statistical analysis.



Roles & Responsibilities:

  • Responsible for design, deployment and testing of the multi-tier application on J2EE.
  • Implemented the business logic validations using Spring Validation framework.
  • Developed unit test cases. For the testing purpose of the application used JUnit.
  • Explored, used and configured Hadoop ecosystem features and architecture.
  • Involved in setting up and monitoring a scalable and distributed system based on Hadoop HDFS.
  • Configured Hadoop cluster in Local (standalone), Pseudo-Distributed & Fully Distributed Mode.
  • Worked closely with the business team to gather their requirement and new support features.
  • Configured SQOOP and developed scripts to extract data from staging database into HDFS.
  • Wrote MapReduce code with different user defined functions based on the requirements and integrated it with HIVE to extend the accessibility of statistical procedures within the entire analysis team.
  • Used FLUME to extract clickstream data from the web server.
  • Improved the website design by using clickstream analysis which provided actionable feedback to help improve product visibility and customer service.
  • Wrote programs using scripting languages like pig to manipulate data.
  • Reviewed the HDFS usages and system design for future scalability and fault-tolerance.
  • Prepared Extensive shell scripts to get the required information from the log.
  • Performed white box testing and monitoring all the logs in Development and Production environments.

Tools &Technologies: J2EE, Spring, JUnit, Hadoop (HDFS/MapReduce), PIG, HIVE, SQOOP, FLUME, SQL, Statistical clickstream analysis.



Roles & Responsibilities:

  • Analyze application Confidential rsquo;s current detail functionality and interfaces; Access and build the business knowledge of various functions of the application that requires functional and regression testing.
  • Developed the web interface using Spring MVC, Java Script, HTML and CSS.
  • Extensively used Spring controller component classes for developing the applications.
  • Involved in developing business tier using stateless session bean and message driven beans.
  • Used JDBC and Hibernate to connect to the database using Oracle.
  • Data sources were configured in the app server and accessed from the DAO's through Hibernate.
  • Design patterns of business Delegates, Service Locator and DTO are used for designing the web module of the application.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Involved in developing database specific data access objects (DAO).
  • Used CVS for the source code control and JUNIT for unit testing.
  • Used Eclipse to develop entity and session beans.
  • The entire application is deployed in WebSphere Application server.
  • Followed coding and documentation standard.

Tools & Technologies: Java, J2EE, JDK, Java Script, XML, Spring MVC, Servlets, JDBC, EJB, Hibernate, Web services, JUnit, CVS, IBM Web Sphere, Eclipse.



Roles & Responsibilities:

  • Responsible for analyzing requirement by coordinating with the Lead and team members.
  • Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
  • Used knowledge of design tools for building use cases and class diagrams.
  • Utilized programming skills for developing the application.
  • Created stored procedures and triggers to access information.
  • Developed the module based on Struts MVC Architecture.
  • Developed User Interface using java script, JSP, HTML for interactive cross browser functionality and complex UI.
  • Created business logic using servlets and session beans and deployed them on WebLogic server.
  • Created complex SQL Queries, PL/SQL, Stored procedures and functions for back end.
  • Involved in developing code, utilizing the Object Oriented Programming design principles, Unit testing using JUnit and integration testing.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the Confidential defects. Resolved more priority defects as per the schedule.

Tools & Technologies: Java, JSP, Struts, Servlets, WebLogic, Oracle, JUnit, SQL.



Roles & Responsibilities:

  • End-to-end design of application components using Java Collections framework and providing concurrent database access using multithreading.
  • Developed one of the best programs to notify the operational team on downtime of any one of 300 pharmacies on network
  • Performance tuned the application to prevent memory leaks and to boost its performance and reliability.
  • Developed and maintained various unit tests on the applications.
  • Created an interface using JSP, Servlet and MVC Struts architecture to resolve stuck orders in different pharmacies.
  • Analysis of different database schemas (using SQL) to gather sales metrics/trends and develop business reports.
  • Closely worked with data analysis team to discuss business insights and recommend actionable tasks. Insights include identifying high demand medicines through correlations to various spatial/temporal factors and minimize unused stock.
  • Develop analysis to increase regional sales by 20%; Achieved by routing product coupons and offers to targeted consumers.

Hire Now