- Experienced with data modeling, Hadoop MapReduce Architecture and distributed System.
- Used Cloudera CDH4 and CDH5 manager for ad - hoc queries for development purpose.
- Designed and implemented client specific data-migration plan to transfer data to HDFS from RDBMS like MYSQL and ORACLE server using SQOOP and FLUME for very large amount of log data. Implemented PARTITIONING and BUCKETING schemes for quick data access to HIVE.
- Developed data-analysis implementations in PIG and HIVE. Executed workflows using OOZIE.
- Created PIG and HIVE UDFs to implement the functionalities not available in PIG. Also, used Piggybank user defined function Repository.
- Worked with MongoDB data modeling, versioning, BPM. Used MongoDB GridFS to store large files.
- Developed Java based MapReduce code, UDFs, User Defined Aggregate Functions (UDAFs). Deployed JAR files to implement custom functions otherwise unavailable in PIG and HIVE.
- Identified, designed, and developed statistical data analysis routines to process data for extract of "Business intelligence" insights. Implemented Apache MAHOUT based machine learning algorithm following the requirement.
- Created SPARK Streaming applications. Framework exploration for transition from Hadoop/MapReduce to SPARK in memory computation using RDD's.
- Experience with real-time data analysis using Spark Java API.
- Configured and developed Solr search. Experienced with configuring Solr Schema and xml files, Solr indexing and query. Hands on work with Alfresco-Solr and Lucene search.
- Worked on Data Migration, Data warehousing, Data Mart and Master Data management using Informatica. Designed source to target ETL mapping, created session task and workflow using Powercenter.
- Created and maintained databases and tables in MYSQL, Oracle and MariaDB. Able to write complex SQL queries for required use cases. Experienced working on data modeling using Oracle SQL Developer Data Modeler and TOAD Data Modeler.
- Created different types of charts for visualization using BI tools like Tableau, Highcharts. Experienced with creating Executive Dashboard using Tableau.
- Experienced working with statistical modeling using R and its classes to create Word Cloud, Sentiment Analysis using Naive Bayes model.
- Experienced with integrating R with Tableau to leverage the processing power of R and visualization power of Tableau.
- Experienced with processing and extract data from PDF using Apache Tika, GhostScript, XPDF and OCR (Optical Character Recognition) tool like Tesseract.
- Experienced with Python scripting and Unix shell scripting for data ingestion, processing and Data Ingestion Automation using cron job.
- Installed and configured datastax Cassandra. Worked with datastax Object-mapping API. In depth knowledge about Cassandra architecture, query, read and write path.
- Installed, configured and monitored small Hadoop cluster. Knowledgeable about the similar procedure for big cluster.
- Designed, Developed and tested core Java applications as part of software development life cycle.
- Experienced in J2EE including Restful Web Service, JSP, servlets, EJB, JMS, Spring and Hibernate framework for building client-server application. Ability to create test harness pages using JS, HTML and CSS.
- Experienced working on Linux and UNIX (CentOS, Ubuntu, RHEL) and worked with C++, JS.
- Team player with good communication and presentation skills, great interpersonal skill and excellent work- ethic.
- Ability to work in different task orders in multiple projects simultaneously.
- Experienced working in Agile, Waterfall and hybrid environment.
ADVANCED Big Data Technologies: Cassandra, SPARK, Apache MAHOUT, Cloudera CDH5, Impala, Apache Solr, MongoDB
Big Data Technologies: Apache Hadoop (HDFS/MapReduce), PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE
Languages: Java, J2EE, SQL, Unix, C++
Application Server: Jboss EAP, Apache Tomcat
Machine Learning / Data Analysis / Statistics: Naïve Bayes, Linear Regression.
Operating Systems: Windows, UNIX and Linux
Frame Works: Spring, Hibernate
Version Control: VSS (Visual Source Safe), CVS
Testing Technologies: JUnit, Pytest, Jmeter, Postman
BI Tools: Tableau, Highcharts
Data Processing Tools: R, Tesseract, Tika, GhostScript, XPDF
Search Tool: Apache Solr, Lucene
Confidential, Alexandria, VA
BIG DATA HADOOP DEVELOPER / JAVA DEVELOPER/ SOLR DEVELOPER/ MONGODB DEVELOPER
- Ingested Office Action data from different sources into PTO internal VM.
- Created prototype in MongoDB to create data model, implemented versioning and business process model. Used MongoDB GridFS to store large files.
- Used JAVA restful API to create different service layers in Biosequence project.
- Processed Office Action text data using R and SQL to capture different section under rejections in CSV format.
- Created Data Mart on MariaDB in AWS and PTO internal VM to store OA data.
- Worked on Source to Target mapping using Informatica and Data Modeling using Oracle Data Modeler for OA data in the OA data mart.
- Integrated R with Tableau in AWS to use statistical algorithm like Linear/Non-Linear Regression model, Naive Bayes model to forecast the nature and quality of patent process and rejections. Used R to find traits and anomalies in data.
- Created executive dashboard with Tableau to monitor patent quality.
- Ingested PTAB data into BDR (Big Data Reservoir) environment for pre-processing using Java and Unix shell scripting.
- Processed PTAB PDF files to convert them into text and json format using Apache Tika, GhostScript, XPDF and optical character recognition tool Tesseract and send them to AWS S3 and Solr.
- Worked with data extraction from different Patent & Trademark data sources for data mining and predictive analysis on Patent quality.
- Analyzed Trademark call center data with R to create word cloud and sentiment analysis for customer experience use case.
- Worked on java based web application development for CMS project using Jboss eap.
- Worked with DataStax Cassandra to store metadata and applications data access objects.
- Worked with Apache Spark and Flume for real time data analysis.
- Configured Solr directory structure on top of Alfresco workspace-Spacestore and Archive space-store.
- Configured Solr configuration files including repository. properties, schema.xml, solr.xml and solrconfig.xml.
- Worked with Solr and Lucene indexing and search query.
Tools & Technologies: Java, python, Shell scripting, Jboss eap, R, MariaDB, Oracle DB, Tableau, Apache Tika, GhostScript, XPDF, Tesseract, Solr, Alfresco, Cassandra.
Confidential, Alexandria, VA
BIG DATA SENIOR SME/ DATA SCIENTIST
- Responsible for data ingestion from different servers to HDFS on AWS using Apache Flume and Cloudera CDH5.
- Used Java regular expression and Hive SERDE to reform unstructured Apache common logs into structured format.
- Responsible for creating External tables in Hive default derby database to store metadata.
- Loaded historical log data into Hive table to analyze and explore useful data for business using Hive query language.
- Used Cloudera Hive ODBC connector to connect to the database with BI tools like Tableau, Jaspersoft.
- Used Tableau and Jaspersoft to explore different pattern of data.
- Created Spark streaming API to process dynamic data (Click stream analysis) ingested using Flume agent and storing into HDFS.
- Created Spring framework to pull processed data from HDFS and push it to Highcharts.
- Created live Spline and map charts with HTML, CSS and JS using Highcharts API's like Highstocks and Highmaps.
- Set up, configured and monitored Cloudera CDH5 for a three-node cluster in Openstack environment.
- Participated in data modeling for Hive database following Oracle database schema.
- Performed joining operation of Hive tables to denormalize data for query purpose.
- Created appropriate partitions in Hive table to enhance the query speed.
Tools & Technologies: CDH5, Spark, Flume, Hive, Spring, Tableau, Highcharts, Jaspersoft.
Confidential, Virginia Beach, VA
BIG DATA ANALYST / HADOOP DEVELOPER / CASSANDRA DEVELOPER
- Monitored the Hadoop cluster using Cloudera CDH4.
- Collected the business requirement from the subject matter experts like data scientists and business partners.
- Developed Java and PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis. Designed appropriate partitioning/bucketing schema to allow faster data during analysis using HIVE.
- Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Worked with Spark on parallel computing to enhance knowledge about RDD in DataStax Cassandra.
- Worked with Scala to determine the flexibility of Scala on Spark and Cassandra to the management.
- Implemented “Hadoop Streaming” to extend usability of existent analysis procedures written in different languages.
- Analyzed the customer’s clickstream data and Web logs. Loaded data to HBase using Pig script, and analyzed using Hive Script.
- Used Apache MAHOUT clustering techniques to identify customers who may need to go for screening for any particular disease from different locations and to find the relation between their account activity and the service they need to be provided.
- Used OOZIE work flow scheduler to schedule different MapReduce job.
- Involved in managing and reviewing Hadoop log file.
- Transferred the analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.
Tools & Technologies: Cassandra, Hadoop, Cloudera CDH4, Hive, HBase MAHOUT, SPARK, Statistical/Clickstream analysis.
Confidential, Harrisburg, PA
- Observed the Set up and monitoring of a scalable distributed system based on HDFS for better idea and worked closely with the team to understand the business requirement and add new support features.
- Gathered business requirement to determine the feasibility and to convert them to technical tasks in the design document.
- Installed and configured Hadoop MapReduce jobs, HDFS and developed multiple MapReduce jobs in java and used different UDF's for data cleaning and processing.
- Involved in loading data from LINUX file system to HDFS.
- Used PIG Latin and PIG scripts to process data.
- Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Extracted the data from various SQL servers into HDFS using SQOOP. Developed custom MapReduce codes, generated JAR files for user defined functions and integrated it with HIVE to extend the accessibility of statistical procedures within the entire analysis team.
- Implemented Partitioning, Dynamic partitioning and Bucketing in HIVE using internal and external table for more efficient data.
- Used HIVE queries for aggregating the data and mining information sorted by volume and grouped by vendor and product.
- Performed statistical data analysis routines using Java API's to analyze data using.
Tools: Hadoop (HDFS/MapReduce), PIG, HIVE, SQOOP, SQL, Linux, Statistical analysis.
Confidential, Indianapolis, IN
JAVA DEVELOPER/ HADOOP DEVELOPER
- Responsible for design, deployment and testing of the multi-tier application on J2EE.
- Implemented the business logic validations using Spring Validation framework.
- Developed unit test cases. For the testing purpose of the application used JUnit.
- Explored, used and configured Hadoop ecosystem features and architecture.
- Involved in setting up and monitoring a scalable and distributed system based on Hadoop HDFS.
- Configured Hadoop cluster in Local (standalone), Pseudo-Distributed & Fully Distributed Mode.
- Worked closely with the business team to gather their requirement and new support features.
- Configured SQOOP and developed scripts to extract data from staging database into HDFS.
- Wrote MapReduce code with different user defined functions based on the requirements and integrated it with HIVE to extend the accessibility of statistical procedures within the entire analysis team.
- Used FLUME to extract clickstream data from the web server.
- Improved the website design by using clickstream analysis which provided actionable feedback to help improve product visibility and customer service.
- Wrote programs using scripting languages like pig to manipulate data.
- Reviewed the HDFS usages and system design for future scalability and fault-tolerance.
- Prepared Extensive shell scripts to get the required information from the log.
- Performed white box testing and monitoring all the logs in Development and Production environments.
Tools: J2EE, Spring, JUnit, Hadoop (HDFS/MapReduce), PIG, HIVE, SQOOP, FLUME, SQL, Statistical clickstream analysis.
- Analyze application’s current detail functionality and interfaces; Access and build the business knowledge of various functions of the application that requires functional and regression testing.
- Developed the web interface using Spring MVC, Java Script, HTML and CSS.
- Extensively used Spring controller component classes for developing the applications.
- Involved in developing business tier using stateless session bean and message driven beans.
- Used JDBC and Hibernate to connect to the database using Oracle.
- Data sources were configured in the app server and accessed from the DAO's through Hibernate.
- Design patterns of business Delegates, Service Locator and DTO are used for designing the web module of the application.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Involved in developing database specific data access objects (DAO).
- Used CVS for the source code control and JUNIT for unit testing.
- Used Eclipse to develop entity and session beans.
- The entire application is deployed in WebSphere Application server.
- Followed coding and documentation standard.
Tools: Java, J2EE, JDK, Java Script, XML, Spring MVC, Servlets, JDBC, EJB, Hibernate, Web services, JUnit, CVS, IBM Web Sphere, Eclipse.
- Responsible for analyzing requirement by coordinating with the Lead and team members.
- Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
- Used knowledge of design tools for building use cases and class diagrams.
- Utilized programming skills for developing the application.
- Created stored procedures and triggers to access information.
- Developed the module based on Struts MVC Architecture.
- Developed User Interface using java script, JSP, HTML for interactive cross browser functionality and complex UI.
- Created business logic using servlets and session beans and deployed them on WebLogic server.
- Created complex SQL Queries, PL/SQL, Stored procedures and functions for back end.
- Involved in developing code, utilizing the Object-Oriented Programming design principles, Unit testing using JUnit and integration testing.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
Tools: Java, JSP,Struts, Servlets, WebLogic, Oracle, JUnit, SQL.