- Over 7+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in Development and Ecosystem Analytics, Development and Design of Java based enterprise applications.
- Around 5 years of experience in working with HDFS, Map Reduce, Hive, Pig, Spark Streaming, HBase, Zookeeper, Flume, Kafka, Strom, Sqoop, and Oozie.
- Excellent knowledge of Hadoop Architecture and its related components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in working with Banking Domain Application which deal with Capital Markets and Investment Banking.
- Installation of Horton works Distribution for Hadoop - HDP 2.0.6 with Apache Ambari on 6 nodes by de-commissioning the rest of the nodes.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
- Monitoring cluster health status on daily basis, optimization/tuning system performance by modifying tuning parameters.
- Extensive knowledge and experience on Installation, configuration, supporting and managing Cloud era Distributions of CDH3/4/5 and good knowledge on Horton works distributions.
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms.
- Experience in gathering business requirements in order to developed applications using Spark on YARN while utilizing Scala.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Experience on Big data Hadoop Data Lake to archive large volume of data and export & import larger volume of data using Sqoop& Spark thereby saving hardware cost of Netezza.
- Good knowledge on Zookeeper to coordinate clusters.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
- Writing Sqoop Jobs to Import/Export data from Hadoop.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, and using Kafka.
- Worked on Python Open stack API's, used Python scripts to update content in the database and manipulate files.
- Worked on Informatica, Informatica BDE and ETL Validator when collaborated with the ETL Teams.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud, performed Export and import of data into s3.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Experience in developing, designing and coding web applications using Java SE (J2SE) and Java EE (J2EE) technologies.
- Strong UNIX Shell Scripting skills.
- Hands on Experience in Web Services using XML, HTML, JSON, JQuery and Ajax.
- In depth understanding with frameworks like spring, Hibernate and MVC.
- Proficient in SQL and PL/SQL using Oracle, DB2, Sybase and SQL Server.
- Building RESTful APIs in front of different types of NoSQL storage engines allowing other groups to quickly meet their Big Data needs and remain insulated from rapid technology changes in the NoSQL field.
- Involved in daily SCRUM meetings and used SCRUM agile methodologies.
Hadoop/Big Data: HDFS, MapReduce, HBase, Mahout, Pig, Hive, Sqoop, MongoDB, Cassandra, Flume, Oozie, Zookeeper, YARN, Spark, Kafka, Teradata, Scala, ETL, Informatica.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans, Maven, Gradle, JUnit, TestNG.
IDE s: Eclipse, Net beans, IntelliJ Idea.
Frameworks: MVC, Struts, Hibernate and Spring.
Programming languages: C, C++, Java, Scala, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MYSQL, DB2, MS-SQL SERVER
Web Servers: WebLogic, WebSphere, Apache Tomcat
Version Controls: SVN, GIT.
Confidential, Baltimore, MD
- Involved with the application teams to install Hadoop updates, patches and version upgrades as required.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Responsible for Data modeling in Cassandra and deciding the row key and different column families in Cassandra.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Developed Sqoop Jobs to Import/Export data from Hadoop .
- Developed Scala and SQL code to extract data from various databases.
- Created Hive tables and analyzing the loaded data in the hive tables using hive queries.
- Developed Hive queries and Pig scripts to analyze large datasets.
- Developed automation scripts to test storage appliances in Python.
- Handled various performance issues on MongoDB/ Cassandra.
- Cleansing data generated from weblogs with automated scripts in Python.
- Used AWS services like EC2 and S3 for small data sets.
- Worked on Tableau for generating reports on HDFS data.
- Integrated Kafka with Flume to send data to Spark Streaming context, HDFS.
- Implemented a proof of concept (Poc's) using Kafka, Strom, HBase for processing streaming data.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, and using Kafka.
- HiveQL scripts to create, load, and query tables in a Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
Environment: Hadoop Ecosystem, HDFS, Map Reduce, Pig, Python, Mahout, Hive, Informatica, Tableau, Chukwa, Eclipse, AWS, Spark, Scala, Shell Scripting, RDBMS, Cassandra.
Confidential, Glendale, CA
Position: Hadoop Developer
- Worked in design and development of a 3 node Hadoop cluster for POC and sample data analysis.
- Involved in requirements gathering from business users and designing and implementing data pipelines and ETL workflows for disparate data sets.
- Implemented Cloud era on a 30 node cluster for P&G consumption forecasting.
- Successfully added 10 node Hadoop cluster for data warehousing, sampling reports and historical data storage in HBase.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Developed Oozie workflows for data ingestion on to the data lake.
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Developed Spark code using Scala and Spark-SQL for batch processing of data.
- Handled the imported data to perform transformations, cleaning and filtering using Hive and Map Reduce.
- Used Pig UDFs for preprocessing the data for further analysis.
- Created Hive tables, partitions and loaded the data to analyze using Hive queries.
- Developed workflow on Oozie for automating the job flows.
- Worked with Program Team and Different Delivery Teams to achieve common goals.
Environment: Hadoop Ecosystem, HDFS, Map Reduce, Pig, Hive, Spark, Scala, Oozie, Shell Scripting, RDBMS.
Confidential, New York
- Develop JAVA MapReduce Jobs for the aggregation and interest matrix calculation for users.
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new Map Reduce jobs.
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Developed dynamic partitioned Hive tables to store data by date and workflow id partition.
- Run clustering and user recommendation agents on the weblogs and profiles of the users to generate the interest matrix.
- Installed and configured Hive and also written Hive UDFs in java and python
- Prepare the data for consumption by formatting it for upload to the UDB system.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloud era Manager, Scala, Pig, Sqoop, Oozie, Zookeeper, Teradata, PL/SQL, MySQL, Windows, Horton works, Oozie, HBase
Jr. Java Developer
- Reading and understanding application requirements
- Involved in design and development of the web pages in web application
- Involved in using Spring JDBC templates to call the stored procedures
- Used AJAX for asynchronous communication with server
- Involved in writing SQL queries in stored procedures in Oracle database.
- Involved in developing web pages by JSP, HTML, and CSS.
- Deploy the application on the JBoss Application Server.
- Developed web application using JSF Framework
- Springs and Hibernate frameworks were used in the project.
- Used the REST by using web services JAX-WS
- Used Maven to trace all the changes made to the source code or files.
- Developed, implemented and performed the Unit Testing using JUnit .
- Performed Manual Testing in each module of this project and writing the test cases.
- Involved in UNIX training for writing the shell scripts.
- Involved as responsible team member for production support and bug fixes.
- Implemented agile methodology for this project.