We provide IT Staff Augmentation Services!

Big Data Hadoop Developer / Java Developer Resume

4.00/5 (Submit Your Rating)

Alexandria, VA

SUMMARY

  • me has 10 years of experience as BIG DATA SME/ HADOOP DEVELOPER/ CASSANDRA DEVELOPER/ ETL DEVELOPER/ JAVA DEVEOPER / SQL DEVELOPER/ DATA SCIENTIST with designing, developing, deploying and supporting Java based software applications including 5 years of experience in data analysis.
  • me has 4 years of experience in Big - Data Analytics and technologies like Cassandra (DataStax), Apache Hadoop framework(HDFS/MapReduce), PIG, HIVE, HBASE, and coordinating their usability using customized Java API and application management using OOZIE.
  • me has experience in core Apache SPARK and it's extended API's like Spark Streaming and also various machines learning algorithm like Apache MAHOUT.
  • me also has experience working with BI tools like Tableau, Highcharts.
  • Experienced with data modeling, Hadoop MapReduce Architecture and distributed System.
  • Used Cloudera CDH4 and CDH5 manager for ad-hoc queries for development purpose.
  • Designed and implemented client specific data-migration plan to transfer data to HDFS from RDBMS like MYSQL and ORACLE server using SQOOP and FLUME for very large amount of log data. Implemented PARTITIONING and BUCKETING schemes for quick data access to HIVE.
  • Developed data-analysis implementations in PIG and HIVE. Executed workflows using OOZIE.
  • Created PIG and HIVE UDFs to implement teh functionalities not available in PIG. Also, used Piggybank user defined function Repository.
  • Worked with MongoDB data modeling, versioning, BPM. Used MongoDB GridFS to store large files.
  • Developed Java based MapReduce code, UDFs, User Defined Aggregate Functions (UDAFs). Deployed JAR files to implement custom functions otherwise unavailable in PIG and HIVE.
  • Identified, designed, and developed statistical data analysis routines to process data for extract of "Business intelligence" insights. Implemented Apache MAHOUT based machine learning algorithm following teh requirement.
  • Created SPARK Streaming applications. Framework exploration for transition from Hadoop/MapReduce to SPARK in memory computation using RDD's.
  • Configured and developed Solr search. Experienced with configuring Solr Schema and xml files, Solr indexing and query. Hands on work with Alfresco-Solr.
  • Worked on Data Migration, Data warehousing, Data Mart and Master Data management using Informatica. Designed source to target ETL mapping, created session task and workflow using Powercenter.
  • Created and maintained databases and tables in MYSQL, Oracle and MariaDB. Able to write complex SQL queries for required use cases. Experienced working on data modeling using Oracle SQL Developer Data Modeler and TOAD Data Modeler.
  • Created different types of charts for visualization using BI tools like Tableau, Highcharts. Experienced with creating Executive Dashboard using Tableau.
  • Experienced working with statistical modeling using R and its classes to create Word Cloud, Sentiment Analysis using Naive Bayes model.
  • Experienced with integrating R with Tableau to leverage teh processing power of R and visualization power of Tableau.
  • Experienced with processing and extract data from PDF using Apache Tika, GhostScript, XPDF and OCR (Optical Character Recognition) tool like Tesseract.
  • Experienced with Python scripting and Unix shell scripting for data ingestion, processing and Data Ingestion Automation using cron job.
  • Installed and configured datastax Cassandra. Worked with datastax Object-mapping API. In depth knowledge about Cassandra architecture, query, read and write path.
  • Installed, configured and monitored small Hadoop cluster. Knowledgeable about teh similar procedure for big cluster.
  • Designed, Developed and tested core Java applications as part of software development life cycle.
  • Experienced in J2EE including Restful Web Service, JSP, servlets, EJB, JMS, Spring and Hibernate framework for building client-server application. Ability to create test harness pages using JS, HTML and CSS.
  • Experienced working on Linux and UNIX (CentOS, Ubuntu, RHEL) and worked with C++, JS.
  • Team player with good communication and presentation skills, great interpersonal skill and excellent work- ethic.
  • Ability to work in different task orders in multiple projects simultaneously.

TECHNICAL SKILLS

ADVANCED Big Data Technologies: Cassandra, SPARK, Apache MAHOUT, Cloudera CDH5, Impala, Apache Solr, MongoDB

Big Data Technologies: Apache Hadoop (HDFS/MapReduce), PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE

Web Technologies: JSP, JDBC, HTML, JavaScript, REST

Languages: Java, Scala (for Spark), J2EE, SQL, Unix, C++

Application Server: Jboss EAP

Machine Learning / Data Analysis / Statistics: Naïve Bayes, Linear Regression.

Operating Systems: Windows, UNIX and Linux

Frame Works: Spring, Hibernate

Version Control: VSS (Visual Source Safe), CVS

Testing Technologies: JUnit, Pytest

BI Tools: Tableau, Highcharts

Data Processing Tools: R, Tesseract, Tika, GhostScript, XPDF

Search Tool: Apache Solr

PROFESSIONAL EXPERIENCE

Confidential, Alexandria, VA

BIG DATA HADOOP DEVELOPER / JAVA DEVELOPER

Responsibilities:

  • Ingested Office Action data from different sources into PTO internal VM.
  • Created prototype in MongoDB to create data model for Biosequence Project, implemented versioning and business process model. Used MongoDB GridFS to store large files.
  • Used JAVA restful API to create different service layers in Biosequence project.
  • Processed Office Action text data using R and SQL to capture different section under rejections in CSV format.
  • Created Data Mart on MariaDB in AWS and PTO internal VM to store OA data.
  • Worked on Source to Target ETL mapping using Informatica and Data Modeling using Oracle Data Modeler for OA data in teh OA data mart.
  • Integrated R with Tableau in AWS to use statistical algorithm like Linear/Non-Linear Regression model, Naive Bayes model to forecast teh nature and quality of patent process, allowable and rejections. Used R to find traits and anomalies in data.
  • Created executive dashboard with Tableau to monitor patent quality.
  • Ingested PTAB data into BDR (Big Data Reservoir) environment for pre-processing using Python and Unix shell scripting.
  • Processed PTAB PDF files to convert them into text and json format using Apache Tika, GhostScript, XPDF and optical character recognition tool Tesseract and send them to AWS S3 and Solr.
  • Worked with data extraction from different Patent & Trademark data sources for data mining and predictive analysis on Patent quality.
  • Analyzed Trademark call center data with R to create word cloud and sentiment analysis for customer experience use case.
  • Worked on java based web application development for CMS project using Jboss eap.
  • Worked with Apache Cassandra to store metadata and applications data access objects.
  • Configured search in Alfresco Share, e.g. Controlling permission checking on search result in Alfresco Share, controlling search result in Alfresco Share.
  • Configured Solr directory structure in Alfresco workspace-Spacestore and Archive space-store.
  • Configured Solr configuration files including repository. properties, schema.xml, solr.xml and solrconfig.xml.
  • Worked with Solr indexing and search query.
  • Tools & Technologies: Java, python, Shell scripting, Jboss eap, R, MariaDB, Oracle DB, Tableau, Apache Tika, GhostScript, XPDF, Tesseract, Solr, Alfresco, Cassandra.

Confidential, Alexandria, VA

BIG DATA SENIOR SME / DATA SCIENTIST

Responsibilities:

  • Responsible for data ingestion from different servers to HDFS on AWS using Apache Flume and Cloudera CDH5.
  • Used Java regular expression and Hive SERDE to reform unstructured Apache common logs into structured format.
  • Responsible for creating External tables in Hive default derby database to store metadata.
  • Loaded historical log data into Hive table to analyze and explore useful data for business using Hive query language.
  • Used Cloudera Hive ODBC connector to connect to teh database with BI tools like Tableau, Jaspersoft.
  • Used Tableau and Jaspersoft to explore different pattern of data.
  • Created Spark streaming API to process dynamic data (Click stream analysis) ingested using Flume agent and storing into HDFS.
  • Created Spring framework to pull processed data from HDFS and push it to Highcharts.
  • Created live Spline and map charts with HTML, CSS and JS using Highcharts API's like Highstocks and Highmaps.
  • Set up, configured and monitored Cloudera CDH5 for a three-node cluster in Openstack environment.
  • Participated in data modeling for Hive database following Oracle database schema.
  • Performed joining operation of Hive tables to denormalize data for query purpose.
  • Created appropriate partitions in Hive table to enhance teh query speed.
  • Tools & Technologies: CDH5, Spark, Flume, Hive, Spring, Tableau, Highcharts, Jaspersoft.

Confidential, Virginia Beach, VA.

BIG DATA ANALYST / HADOOP DEVELOPER / CASSANDRA DEVELOPER.

Responsibilities:

  • Monitored teh Hadoop cluster using Cloudera CDH4.
  • Collected teh business requirement from teh subject matter experts like data scientists and business partners.
  • Developed Java and PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis. Designed appropriate partitioning/bucketing schema to allow faster data during analysis using HIVE.
  • Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
  • Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
  • Worked with Spark on parallel computing to enhance knowledge about RDD in DataStax Cassandra.
  • Worked with Scala to determine teh flexibility of Scala on Spark and Cassandra to teh management.
  • Implemented “Hadoop Streaming” to extend usability of existent analysis procedures written in different languages.
  • Analyzed teh customer’s clickstream data and Web logs. Loaded data to HBase using Pig script, and analyzed using Hive Script.
  • Used Apache MAHOUT clustering techniques to identify customers who may need to go for screening for any particular disease from different locations and to find teh relation between their account activity and teh service they need to be provided.
  • Used OOZIE work flow scheduler to schedule different MapReduce job.
  • Involved in managing and reviewing Hadoop log file.
  • Transferred teh analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.

Confidential, Harrisburg, PA

HADOOP DEVELOPER

Responsibilities:

  • Observed teh Set up and monitoring of a scalable distributed system based on HDFS for better idea and worked closely with teh team to understand teh business requirement and add new support features.
  • Gatheird business requirement to determine teh feasibility and to convert them to technical tasks in teh design document.
  • Installed and configured Hadoop MapReduce jobs, HDFS and developed multiple MapReduce jobs in java and used different UDF's for data cleaning and processing.
  • Involved in loading data from LINUX file system to HDFS.
  • Used PIG Latin and PIG scripts to process data.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Extracted teh data from various SQL servers into HDFS using SQOOP. Developed custom MapReduce codes, generated JAR files for user defined functions and integrated it with HIVE to extend teh accessibility of statistical procedures within teh entire analysis team.
  • Implemented Partitioning, Dynamic partitioning and Bucketing in HIVE using internal and external table for more efficient data.
  • Used HIVE queries for aggregating teh data and mining information sorted by volume and grouped by vendor and product.
  • Performed statistical data analysis routines using Java API's to analyze data using.

Confidential, Indianapolis, IN

JAVA DEVELOPER/HADOOP DEVELOPER

Responsibilities:

  • Responsible for design, deployment and testing of teh multi-tier application on J2EE.
  • Implemented teh business logic validations using Spring Validation framework.
  • Developed unit test cases. For teh testing purpose of teh application used JUnit.
  • Explored, used and configured Hadoop ecosystem features and architecture.
  • Involved in setting up and monitoring a scalable and distributed system based on Hadoop HDFS.
  • Configured Hadoop cluster in Local (standalone), Pseudo-Distributed & Fully Distributed Mode.
  • Worked closely with teh business team to gather their requirement and new support features.
  • Configured SQOOP and developed scripts to extract data from staging database into HDFS.
  • Wrote MapReduce code with different user defined functions based on teh requirements and integrated it with HIVE to extend teh accessibility of statistical procedures within teh entire analysis team.
  • Used FLUME to extract clickstream data from teh web server.
  • Improved teh website design by using clickstream analysis which provided actionable feedback to halp improve product visibility and customer service.
  • Wrote programs using scripting languages like pig to manipulate data.
  • Reviewed teh HDFS usages and system design for future scalability and fault-tolerance.
  • Prepared Extensive shell scripts to get teh required information from teh log.
  • Performed white box testing and monitoring all teh logs in Development and Production environments.

Confidential

JAVA DEVELOPER

Responsibilities:

  • Analyze application’s current detail functionality and interfaces; Access and build teh business knowledge of various functions of teh application dat requires functional and regression testing.
  • Developed teh web interface using Spring MVC, Java Script, HTML and CSS.
  • Extensively used Spring controller component classes for developing teh applications.
  • Involved in developing business tier using stateless session bean and message driven beans.
  • Used JDBC and Hibernate to connect to teh database using Oracle.
  • Data sources were configured in teh app server and accessed from teh DAO's through Hibernate.
  • Design patterns of business Delegates, Service Locator and DTO are used for designing teh web module of teh application.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Involved in developing database specific data access objects (DAO).
  • Used CVS for teh source code control and JUNIT for unit testing.
  • Used Eclipse to develop entity and session beans.
  • Teh entire application is deployed in WebSphere Application server.
  • Followed coding and documentation standard.

Confidential

JAVA DEVELOPER

Responsibilities:

  • Responsible for analyzing requirement by coordinating with teh Lead and team members.
  • Involved in teh complete SDLC software development life cycle of teh application from requirement analysis to testing.
  • Used knowledge of design tools for building use cases and class diagrams.
  • Utilized programming skills for developing teh application.
  • Created stored procedures and triggers to access information.
  • Developed teh module based on Struts MVC Architecture.
  • Developed User Interface using java script, JSP, HTML for interactive cross browser functionality and complex UI.
  • Created business logic using servlets and session beans and deployed them on WebLogic server.
  • Created complex SQL Queries, PL/SQL, Stored procedures and functions for back end.
  • Involved in developing code, utilizing teh Object Oriented Programming design principles, Unit testing using JUnit and integration testing.
  • Provided Technical support for production environments resolving teh issues, analyzing teh defects, providing and implementing teh solution defects. Resolved more priority defects as per teh schedule.

Confidential

JUNIOR SOFTWARE DEVELOPER

Responsibilities:

  • End-to-end design of application components using Java Collections framework and providing concurrent database access using multithreading.
  • Developed one of teh best programs to notify teh operational team on downtime of any one of 300 pharmacies on network
  • Performance tuned teh application to prevent memory leaks and to boost its performance and reliability.
  • Developed and maintained various unit tests on teh applications.
  • Created an interface using JSP, Servlet and MVC Struts architecture to resolve stuck orders in different pharmacies.
  • Analysis of different database schemas (using SQL) to gather sales metrics/trends and develop business reports.
  • Closely worked with data analysis team to discuss business insights and recommend actionable tasks. Insights include identifying high demand medicines through correlations to various spatial/temporal factors and minimize unused stock.
  • Develop analysis to increase regional sales by 20%; Achieved by routing product coupons and offers to targeted consumers.

We'd love your feedback!