- Over 7 years of experience in Hadoop and Java technologies involving Analysis, Design, Testing, Implementation and Training with HDFS, MapReduce, Apache Pig, Hive, HBase, Sqoop, Oracle, JSP, JDBC and Spring.
- Very good experience in complete project life cycle design, development, testing and implementation of Client Server and Web applications
- Developed scripts, numerous batch jobs scheduled under Hadoop ecosystem.
- Experience in analyzing data using Hive Query Language, Pig Latin, and custom Map Reduce programs in Java.
- Worked on Importing and exporting data from different databases like Oracle, Teradata, and MySQL into HDFS and Hive using Sqoop.
- Involved in writing Database Queries, creating Stored Procedures, Views, Indexes, Triggers, Functions, Code optimization and performance.
- Basic knowledge in Apache Spark for fast large scale in memory MapReduce.
- Good working experience on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node & Map Reduce programming paradigm.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Good working experience on Hadoop Cluster architecture and monitoring the cluster. In - depth understanding of Data Structure and Algorithms.
- Expert in using Sqoop for fetching data from different systems to analyze in HDFS, and again putting it back to the previous system for further processing.
- Good experience with MapReduce performance optimization techniques to effective utilization of cluster resources.
- Have experience in creating ETL data pipelines by using MapReduce, Hive, Pig, Sqoop and UDF's (Hive, Pig) in Java.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Databases like Oracle, SQL Server, DB2, and MySQL.
- Proficient in deploying applications on J2EE Application servers like WebSphere, WebLogic, Glassfish, JBoss and Apache Tomcat web server
- Worked extensively on Web services and the Service-Oriented Architecture (SOA), Web Services Description Language (WSDL), Simple Object Access Protocol (SOAP), and UDDI.
- Experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlet, Java beans, JSTL, JSP custom tags, JSF and JDBC.
- Experience in designing class diagrams, block Diagrams and Sequence Diagrams by using Microsoft Visio.
- Used Kafka for message brokering, streaming and log aggregation to put physical logs into centralized locations.
- Highly motivated self-starter with Excellent Communication, Presentation and Problem Solving Skills and committed to learning new technologies.
- Committed to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.
Hadoop Technologies: HDFS, MapReduce, YARN, Hive, Pig, HBASE, Impala, Zookeeper, Sqoop, OOZIE, Apache Cassandra, Flume, Spark, AWS, EC2
Java Technologies: J2EE, JSP, JSTL, EJB, JDBC, JMS, JNDI, JAXB, JAX-WS, JAX-RPC, SOAP, WSDL
Languages: C, Java, SQL, PL/SQL, Scala, Shell Scripts
Operating Systems: Linux, UNIX, Windows
Databases: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase
Application Servers: WebLogic, WebSphere, Apache Tomcat, JBOSS
IDE’s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.
Version Control: CVS, SVN, Git
Reporting Tools: Jaspersoft, Qlik Sense, Tableau, JUnit
Confidential, Franklin TN
Big Data/Hadoop Developer
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked on the Log Analytics platform, which collects usage data from CRM users and determine popularity of features, adoption of newly added features & retire unused features.
- Installed and configured Hadoop, MapReduce, and HDFS. Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Developing Scripts and Batch Job to schedule various Hadoop Program.
- Installed, configured, and wrote Pig Latin scripts, Wrote MapReduce job using Pig Latin.
- Involved in extracting customer's Big data from various data sources into Hadoop HDFS. This included data from Excel, ERP systems, databases and also log data from servers.
- Responsible for creating Hive tables, loading them with data, and writing Hive queries which ran internally in MapReduce.
- Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
- Involved in creating a plugin that allows Hadoop MapReduce programs, HBase, Pig, and Hive to work unmodified and access files directly.
- Extensively used Pig for data cleansing.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage & review Hadoop log files and Data backups.
- Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
- Involved in loading data from UNIX file system to HDFS.
- Setup and benchmarked Hadoop/HBase clusters for internal use.
- Involved in designing Restful services using Java based API’s like JERSEY.
- Used Pig to do transformations, event joins, Java API and pre-aggregations performed before loading JSON files format onto HDFS.
- Involved in resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way.
- Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Responsible for Data Ingestion using Flume and Kafka.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in loading data from LINUX file system to HDFS.
- Very good understanding of Partitioning, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Supporting, Configuring, installing, building and designing Hadoop. Being a part of a POC effort in order to make new Hadoop clusters.
- Used Kafka to move logs from physical repositories to centralized locations.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Apache Hadoop, Spark, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm, REST/SOAP API, AWS
Confidential, Seattle, WA
Big Data/Hadoop Developer
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Developed java Map Reduce programs using core concepts like OOPS, Multithreading, Collections and IO.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Helped the team to increase cluster size from 35 nodes to 118 nodes. The configuration for additional data nodes was managed using Puppet.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Used Sqoop to import the data on to Cassandra tables from databases and also importing data from various sources to the Cassandra cluster using Java API's.
- Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Developed Pig Latin scripts to extract data from web server output files to load into HDFS.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Hands on writing Map Reduce code to make semi structured data as structured data and for inserting data into HBase from HDFS.
- Implemented a script to transmit information from Webservers to Hadoop using Flume.
- Used Zookeeper to manage coordination among the clusters.
- Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
- Created workflow in Oozie for Automating tasks of loading data into Amazon S3 and to preprocess using Pig, utilized Oozie for data scrubbing and processing
- Developed scripts and deployed them to pre-process the data before moving to HDFS.
- Performed extensive analysis on data with Hive and Pig.
- Upgraded the Hadoop Cluster from CDH3 to CDH4 and setup High Availability Cluster Integrate the HIVE with existing applications
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
- Analyzed data by performing Hive queries and running Pig scripts to know user behavior.
Environment: Apache Hadoop, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm
Confidential, Los Angeles, CA
- Performed Hadoop cluster environment administration like adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, & trouble shooting.
- Worked on Hadoop cluster which ranged from 5-10 nodes during pre-production stage and it was sometimes extended up to 25 nodes during production.
- Used the light weight container of the Spring Frame work to provide architectural flexibility for inversion of controller.
- Involved in the configuration of System architecture by implementing Hadoop file system in master and slave systems in Red Hat Linux Environment.
- Developed Map Reduce programs to cleanse data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- Wrote SQL queries to process the data using Spark SQL.
- Extracted data from different databases and to copy into HDFS file system using Sqoop.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Used Maven with SOAP Web services (JAX-WS) using XML, WSDL and Apache CXF.
- Used Spring Integration (SI) to expose some services of our application for other applications in the company to use.
- Used SOAP UI to test the SOAP Web services.
- Managed and created and altered Databases, Tables, Views, Indexes, and Constraints with business rules using T-SQL.
- Created complex Stored Procedures, Triggers and User Defined Functions to support the front-end application.
- Participated in trouble shooting the production issues and coordinated with the team members for the defect resolution under the tight timelines.
- Implementation: Involved in end to end implementation in the production environment validating the implemented modules.
Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Sqoop, REST/SOAP API
Confidential, Chicago, IL
- Developed Map Reduce jobs for Log Analysis, Analytics and to generate reports for the number of activities created on particular day.
- Involved in implementing complex Map Reduce programs to perform joins on the Map side using distributed cache in Java.
- Implemented analytical algorithms using Map Reduce programs to apply on HDFS data.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Developed the application with help of Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view.
- Generated Business Logic using SOAP and deployed them on WebLogic server.
- Responsible for determining bottlenecks and fixing the bottlenecks with performance tuning.
- Configure JMS MQ in IBM WebSphere for various environment and make sure ESL connectivity between Dealer Direct, Trade Capture and Settlement via ESL message queue.
- Create common Java components between the applications in order to convert data to appropriate state for the applications.
- Coordinate with Engineering team to provide engineering design documents to build the environments as per Single Security project requirements.
- Using IBM RAD application tool and debug any Java issues while deploying or integrate with other Java applications.
- Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data in Hadoop.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
- Wrote Java code for file writing and reading, extensive usage of data structure Array List and Hash Map.
- Wrote test cases which adhere to a Test-Driven Development (TDD) pattern.
- Built scripts using ANT that compiles the code, pre-compiles the JSPs, built an EAR file and deployed the application on the application server.
- Used CVS as a version control system, an important component of Source Configuration Management (SCM).
Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, SOAP
Confidential, South Windsor, CT
- Involved in requirements gathering and analysis from the existing system. Captured requirements using Use Cases and Sequence Diagrams.
- Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
- Implemented MVC, DAO, Session façade, Service locator J2EE design patterns as a part of application development.
- Used Custom Tag Library (JSTL) to build the user Interface of the application.
- Implemented the MVC pattern with Struts framework with Tiles for the presentation layer
- Implemented various design patterns: Singleton, Data Access Object (DAO), Command Design Pattern, Factory Method Design Pattern.
- Used Web Services - WSDL and SOAP for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
- Involved in requirement gathering, HLD and LLD and prepared activity diagrams, sequence diagrams, class diagrams & use case diagrams for various use cases using Rational Rose.
- Extensively involved in the development of backend Logics or data access logic using Oracle DB & JDBC.
- Developed stored procedures, triggers and functions with PL/SQL for Oracle database.
- Used Ant for building & worked with Production Control team for implementation & deployment.
- Used Log4J for logging and analyzing system performance and flow, involved in code refactoring and bug fixing
- Tested Service and data access tier using JUnit in TDD methodology