- 7+ years’ of Professional experience in IT Industry in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE.
- 3+ years’Real time experience in Hadoop Framework and its ecosystem.
- Experience in installation, configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, MapReduce, Job Tracker, Task Tracker, Namenode, Datanode andSecondaryNamenodeconcepts.
- Knowledge in installing, configuring, and using Hadoop ecosystem components like Map Reduce, HDFS, Oozie, Hive, Sqoop, Pig, Flume, Kafka, Storm, Spark and Zookeeper.
- Experience implementing Real Time Stream Processing.
- Experience in using Flume, ApacheKafka to load the log data from multiple sources directly into HDFS in Real Time.
- Experience implementing ApacheStorm topologies - Rich and Trident. Exposure for streaming Event handling using Strom. Having good experience on Storm Topology and writing Spouts and Bolts using Java API.
- Experience in NoSQL databases such as HBase and Cassandra.
- Expertise in writing Hadoop Jobs for processing and analyzing data using MapReduce, Hive&Pig. Experienced in extending Hive and Pig corefunctionalitybywriting custom UDFs using Java.
- Hands-on experience on YARN (MapReduce 2.0) architecture and it components.
- Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, JSP, JDBC.
- Hands on experience in application development using Java, UNIX Shell scripting and RDBMS.
- Very good hands-on technical knowledge of ETL Tools, DataStage, SQL and PL SQL.
- Vast Experience in Teradata and Involved in Converting Projects from Teradata to Hadoop.
- Developed JIL scripts and Scheduled several jobs using Autosys and experienced using GIT and SVN.
- Excellent interpersonal and communication skills, result-oriented with problem solving and leadership skills.
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, HBase, Cassandra, Zookeeper, YARN, TEZ, Flume, Kafka, Storm, Spark
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Frameworks: MVC, Struts, Hibernate, Spring
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL, NoSQL (Hbase, Cassandra)
Programming Languages: C, C++, Java, JQuery, Python, UNIX Shell Scripting
IDE’: Eclipse, NetBeans
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Build Management tools: Maven, Apache ANT, SOAP
Predictive Modelling Tools: Tableau, SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos
Confidential, SF, CA
Sr. Hadoop Developer
- Played a key role in design and development of data quality framework that will generate metrics on multiple datasets, evaluate and report results.
- Used Kafka and Strom for processing application log data in real time.
- Setting up the Storm Streaming and Kafka Cluster.
- Integrated Kafka with Storm by reading the data from Kafka producer and processed it in a series of Spouts and Bolts in Storm Topology.
- Deploying various Topologies into the Storm cluster based on the business use cases.
- Integrated Storm with HBase for storing the processed log data, which displayed server health on monitoring dashboard.
- Worked with Flume to import the log data from the reaper logs and syslog’s into the Hadoop cluster.
- Processed data into HDFS by developing solutions, analyzed the batch data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Involved in creating Hive tables, Created components like Hive UDFs for missing functionality as per the business requirements in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins and Reducer side joins.
- Used complex data types like bags, tuples, and maps in PIG for handling data.
- Wrote custom Map Reduces programs in order to analyze data and used Pig Latin to clean unwanted data.
- Used SQOOP to dump data from Oracle and Teradata into HDFS using incremental load. Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from RDBMS tables to Hive.
- Used different file formats like Text files, Sequence Files, Avro, Orc File, and RCF.
- Used various Compression Techniques like gzip, bzip2, snappy, lzo.
- Configuring Zookeeper in order to provide various cluster co-ordination services.
- Used UC4 workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Involved monitoring Autosys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not using UC4.
Environment : Storm, Kafka, HDFS, Pig, Sqoop, MapReduce, HIVE, PIG, NoSQL - HBase, WebLogic Apache Hadoop, Zookeeper, Shell Scripting, Ubuntu, Flume, Tableau, Agile, Impala.
Confidential, Tampa, FL
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleansing and processing.
- Written MapReduce code to process and parsing the data from various sources and storing the parsed data into HBase and Hive using Hive integration.
- Worked with Hbase and Hive scripts to extract, transform and load the data into HBase and Hive.
- Worked with different File Formats like TEXTFILE, AVROFILE, and ORC for HIVE querying and processing.
- Used various Compression Techniques like gzip, bzip2, snappy, lzo.
- Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode recovery, capacity planning and slots configuration.
- Implemented Flume to collect data from various sources and is loaded into HDFS for further processing.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Tuned the cluster for optional performance to process these large datasets.
- Built reusable Hive UDFs to sort structure fields and return complex datatype.
- Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR testing library.
- Developed workflow in Control-M to automate tasks of loading data into HDFS, preprocessing with PIG.
- Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
- Implemented Oozie engine to chain multiple MapReduce, Hive jobs.
- Modelled Hive Partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
- Partitioned each day’s data into separate partitions for easy access and efficiency.
- Used Sqoop to import from different database and file systems to HDFS and vice versa.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Integrated BI tool with Impala.
- Cluster coordination services through Zookeeper.
- Also assisted admin team in installed and configuration of additional nodes in Hadoop cluster
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
Environment: Hadoop, MapReduce, Hive, MySQL, Hbase, HIVE Impala, PIG, Sqoop, Oozie, Flume, Cloudera, Zookeeper, Eclipse (Kepler), Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, UNIX, Tableau, Control-M
Confidential, Concord, NC
- Interacted daily with the onshore counterparts to gather requirements.
- Worked on few development projects on Teradata and gained a knowledge on Teradata utilities like Mload, Tpump, Bteq etc.
- Performed Input data analysis, generated space estimation reports for Staging and Target tables in Testing and Production environments.
- Worked on a Remediation project to optimize all the Teradata SQL queries in e-commerce line of business and applied several query tuning and query optimization techniques in Teradata SQL.
- Converted projects from Teradata to Hadoop as Teradata proved to be costly to handle huge data of the Bank, thus gained knowledge on Hadoop right from scratch.
- Installed and configured Hadoop MapReduce, HDFS and started to load data into HDFS instead of Teradata and performed Data Cleansing and Processing operations.
- Have gained an in-depth knowledge on HDFS data storage and MapReduce data processing techniques.
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Designed and Scheduled workflows using Oozie.
- Created UNIX Shell Scripts to with in-turn call several Hive and Oozie scripts.
- Built several Managed and External HIVE tables and performed several Joins on those tables to achieve the result.
- Modelled Hive Partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning. Partitioned each day’s data into separate partitions for easy access and efficiency.
- Scheduled several jobs which include several Shell, Oozie, hive and SQOOP scripts using Autosys. Coded JIL scripts to determine the job dependencies while scheduling.
- Performed Data analytics on the credit card transactional data and Coded Automatic Report Mailing Scripts in UNIX.
- Worked in Testing and Production environments and learnt moving components from one environment to the other Subversion (SVN).
- Performed Encryption and Decryption of key fields like Account Number of the Bank customers in several input and reporting files using COBOL.
- Monitor Hadoop jobs on Performance and Production cluster. Provided Production support for few successful runs.
- Built complete ETL logic, generated transformations, work flows and automated the scheduled runs.
- Built an Automatic Query Performance Metrics Generation Tool in Teradata using SQL Procedures.
- Generated Test scripts and Test plan, Data Analysis and Defect Reporting using HP Quality Center.
Environment: Hadoop, Teradata, HDFS, MapReduce, HIVE, PIG, Sqoop, Oozie, Java, COBOL, Autosys
Confidential, Cincinnati, OH
- Used HibernateORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
- Implemented Oracle Advanced Queuing using JMS and Message driven beans.
- Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
- Implemented Dependency injection of Spring frame work.
- Developed and implemented the DAO and service classes.
- Developed reusable services using BPEL to transfer data.
- Participated in Analysis, interface design and development of JSP.
- Configured log4j to enable/disable logging in application.
- Wrote SPA (Single Page Web Applications) using RESTFUL web services plus Ajax and AngularJS.
- Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, JQuery and CSS.
- Implemented PL/SQL queries, Procedures to perform data base operations.
- Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
- Implemented Log4j for logging purpose in the application.
- Involved in code deployment activities for different environments.
- Implemented agile development methodology.
Confidential, New York, NY
Roles and Responsibilities:
- Modeling conceptual design using Use Case, UML Class and Activity diagrams using Rational Rose.
- Wrote requirement specific SQL and PL/SQL scripts including Stored Procedures, functions, packages and triggers.
- Implemented Database access through JDBC at Server end with Oracle.
- Used Spring Aspect Oriented Programming (AOP) for addressing cross cutting concerns.
- Developed request/response paradigm by using Spring Controllers, Inversion of Control and Dependency Injection with Spring MVC.
- Used Web Services like SOAP and WSDL to communicate over internet.
- Involved in implementation of the JMS Connection Pool, including publish and subscribe using Spring JMS.
- Used CVS for version control and Log4j for logging.
- Used JProbe and JConsole to profile application for memory leaks and resource utilization.
- Developed test classes in JUnit for implementing unit testing.
- Deployed the application using WebLogic Application Server.
- Made enhancements to the application which presented me with the opportunity to go through the entire SDLC.
- Providing daily updates to the on-site team over call and making enhancements.
- Assisted the Quality Assurance team in testing the applications.
- Code coverage and Test case presentation.
Environment: Java, JDK1.4, Eclipse Juno, J2EE, JDBC, Servlets, JSP, JSTL, HTML, AJAX, Spring, Java Script, CSS, XSLT, XML. JUnit, WebServices, SOAP, WSDL, REST, Maven, Json, Weblogic, CVS, Rational Application Developer RAD, Hibernate, Rational Rose, JMS.