- Over 6 years of IT industry experience as a Hadoop and Java Developer, and worked during various phases of SDLC such as Requirement Gathering, Analysis, Design, Development, Construction, Testing, UAT and Maintenance with timely delivery against aggressive deadlines.
- Over 2.5 years of experience with Hadoop (HDFS, Mapreduce), Spark, Scala, and Hadoop Ecosystem Components (Pig, Hive, Sqoop & Zookeeper).
- Hands - on experience in creating MapReduce jobs and manipulating data and tasks in Hadoop HDFS System using Java.
- Hands-on experience in writing Pig Latin scripts and dealing with pig commands to analyze data sets.
- Hands on experience in installing, configuring and maintaining ecosystem components including Hadoop, Sqoop, Pig, Hive & Spark.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 10g/11g/12c and SQL Server. Effectively made use of Table Functions, Indexes, Table Partitioning, Collections, Analytical functions and performance tuning.
- Experience working on NoSQL databases including Hbase and Cassandra.
- Experience using Sqoop to import data into HDFS from RDBMS (Oracle, Mysql, SQL Server).
- Involved in writing bash shell scripting on UNIX for Sqoop and Hive to run the session’s resource manager scheduler experience. Schedule jobs and updated the scripts as per the requirement.
- Expertise in using J2EE application servers such as JBoss and web servers like Apache Tomcat.
- Experienced in java GUI/IDE Tools using Eclipse, NetBeans
- Experienced in database GUI/IDE Tools using TOAD, SQL Developer and ERWin.
- Involved in Data Extraction, Transformation and Loading (ETL process) from Source to target with experience of Informatica Power Center.
- Worked with Git to support local and centralized version control with recording changes to set of files for future recall.
- Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
Big Data Ecosystem\ Database: Hadoop 2.x (MapReduce, HDFS), HBase 1.2.x, \ Oracle 10g/11g, MySQL 5.5, NoSql Spark 1.6.1/2.1.1 , Hive 1.2.1/2.1.0 , Pig 0.13.0/\ (HBase 1.1.2, Cassandra 2.1.0)\ 0.14.0, Sqoop 1.4.6, Kafka 0.9.0.0, Flume 1.5.2
Java Technologies \ Methodologies\: Java 1.7/1.8, AJAX, spring, Hibernate\ Agile, UML, Waterfall, Design Patterns
Programming Languages\ Operating Systems\: Scala 2.1.1, Java 1.7/1.8, Oracle PL/SQL, \ Windows XP/7/8/10, Linux Centos, Linux Python 2.7+/3.5+, Bash Shell Scripting.\ Ubuntu, UNIX\
Confidential, Warren, NJ
- Extracted data form Kafka, and convert into DStream in Spark Streaming, and perform transformation to meet different feature requirements.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data for building the common learner data model which got from Kafka in near real time and Persists into Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Extracted historical data from offline sources to enrich the view information of real time streaming data.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Created and update Hives table for the offline data storage.
- Loaded offline data from Hive Database to join with transformed RDD in order to generate required dataset.
- Replaced and complimented exist Spark batch job into Spark Streaming job to enable near real time data analysis.
- Optimizing of existing algorithms by using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Developed UDFs in Java as and when necessary to use in HIVE queries.
- Supported and Monitored Map Reduce Programs running on the cluster.
- Monitored logs and responded accordingly to any warning or failure conditions.
Environment: Apache Hadoop 2.6.0, HDFS, Hive 1.2.1, Java 1.8, Spark 1.6.1, Kafka 0.9.0.0, Sqoop 1.4.6, Linux Ubuntu/CentOs
Confidential, New York, NY
Big Data / Hadoop Developer
- Participated in setting up the 50-node cluster and configured the entire Hadoop platform.
- Migrating the needed data from Oracle into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Mainly worked on Hive queries to categorize data of different claims. Integrated the hive warehouse with HBase.
- Written customized Hive UDFs in Java.
- Designed and created Hive external tables with partitioning, dynamic partitioning and buckets.
- Created HiveQL scripts to create, load, and query tables in a Hive.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Supported Map Reduce Programs those are running on the cluster.
- Used Pig to process raw metadata from S3 before storing data into final hive table.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Presented data and dataflow using Tableau for reusability.
Environment: Hadoop 2.6.0, HDFS, Hive 1.2.1, Map Reduce, Java 1.8, Pig, Sqoop, MySQL 5.5, Tableau
Confidential, Kenilworth, NJ
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in Hive System.
- Create and maintain Hive warehouse for Hive analysis.
- Develop JAVA MapReduce Jobs for the aggregation and interest matrix calculation for users.
- Involved in loading data from LINUX file system to HDFS. Use crontab to create jobs for scheduling resource manager for periodical tasks.
- Experienced in managing and reviewing Hadoop log files. And experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Use Apache S q oop to dump the data user data into the HDFS on a weekly basis.
- Generate test cases for the new MR jobs.
- Prepare the data for consumption by formatting it for upload to the UDB system.
Environment: Hadoop 2.6.0, HDFS, Hive 1.2.1, Map Reduce, Java 1.8, Sqoop 1.4.6, Oracle 11g, MySQL 5.5
- Responsible for requirement gathering and analysis through interaction with end users.
- Developing Java Oracle database web application to process the ordering of quote services for customers. Programming web services code, to enable Java application to execute transactions on other web applications.
- Coding JSP pages to process quote orders and to generate reports on web pages, used by the sales team customers.
- Involved in creating Database SQL and PL/SQL queries and stored Procedures. Developing JavaBeans, Java Server Pages JSP, PL/SQL Procedures and Functions to perform business transactions.
- Developed a Web service to communicate with the database using SOAP. Developed DAO (data access objects) using Spring Framework 3.
- Wrote windows batch scripts and shell scripts to automate FTP process to deploy the web application.
- Debugging production issues, working with support team, Quality Assurance team and other developer teams.
- Actively involved in backend tuning SQL queries/DB script.
- Handling user reported issues on Production and Quality/Test application servers, debugging issues with developers.
- Worked in writing commands using UNIX, Shell scripting.
- Working daily with my Team Lead to develop the application and to fix problems, reporting to the manager weekly.
Environment: Spring 3.2, JSP 2.0, JQuery 1.7, Servlet 3.0, DBC, Oracle 11g/SQL, JUNIT 3.8, CVS 1.2, Eclipse 4.2, DHTML
- Actively involved in the Requirement gathering for the enhancements to the existing project.
- Involved in developing design document and impact assessment documents.
- Involved in designing some of the processes in the application that are developed by other developers.
- Involved in coordinating with testing teams to resolve defects and provide 24/7 support for UAT.
- Improving existing procedures and implementing new stored procedures using PL/SQL.
- Developed business objects, request handlers and JSP’s for the Wireless Manager site using JAVA (Servlets, and Beans) and XML using JDeveloper.
- Hibernate Transaction management is implemented for transactions.
- Developed request handlers, beans, JSP’s and Data Objects in Java.
- Tuning and Index creation for improved performance.
- Create Test Plans using Quality Center by Test Director
Environment: Java 1. 7, Servlets 3.0, JSP, XML, AJAX, Hibernate, Oracle 1 0 g