- Over 9+ years of IT experience in software analysis, design, development and implementation of Big Data, Hadoop and Java/J2EE technologies.
- Good experience in ETL tool Informatica, Managing/maintaining the Hadoop cluster with the help of Apache Ambari
- Worked on migration project from Oracle DB to Hadoop environment thus enhancing the business to next level.
- Installed and configured Hive, HDFS and the NiFi, implemented CDH cluster. Assisted with performance tuning and monitoring.
- Expertise in web development applications using Core Java, Servlets, JSP, EJB, JDBC, XML, XSD, XSLT, RMI, JNDI, Java Mail, XML Parsers (DOM and SAX), JAXP, JAXB, Java Beans etc.
- Good Understanding of RDBMS through Database Design, writing queries using databases like Oracle, SQL Server, DB2 and MySQL.
- Experience on Unit testing using JUnit, TDD, and BDD.
- Experience in modeling applications with UML, Rational Rose and Rational Unified Process (RUP).
- Experience in using CVS and Rational Clear Case for version control.
- Good Working Knowledge of Ant & Maven for project build/test/deployment, Log4j for logging and JUnit for unit and integration testing.
- Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
- In depth knowledge of Spark concepts and experience with Spark in Data Transformation and Processing.
- Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of.
- Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Hands on experience working on NoSQL databases including HBase, Cassandra and its integration with Hadoop cluster.
- Good Experienced in developing User interfaces using JSP, HTML, DHTML, CSS, Java Script, AJAX, JQuery and Angular JS
- Implementing database driven applications in Java using JDBC, XML API and using hibernate framework.
- Expertise in using J2EE Application Servers such as Web Logic 10.3, Web sphere 8.2 and Web Servers such as Tomcat 6.x/7.
- Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
- Experience in working with MapReduce programs, Pig scripts and Hive commands to deliver the best results.
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
- Experienced with IBM Web Sphere Application Server, Oracle Web Logic application servers and Apache Tomcat Application Server.
- In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, Elastic Search and Zookeeper.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
Hadoop Ecosystem: Hadoop3.0, MapReduce, Sqoop, Hive 2.3, Oozie, Pig 0.17, HDFS1.2.4, Zookeeper, Flume 1.8, Impala 2.1, Spark 2.2, Storm, Hadoop (Cloudera), Hortonworks and Pivotal).
NoSQL Databases: HBase 1.2, MongoDB 3.6 & Cassandra 3.11
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
Programming Languages: Java, Python, SQL, PL/SQL, HiveQL, Unix Shell Scripting, Scala 2.12
Cloud Platform: AWS EC2, AWS Configured and S3, Microsoft Azure.
Methodologies: Agile, RAD, JAD, RUP, Waterfall & Scrum
Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
Web/ Application Servers: WebLogic, Tomcat, JBoss
Tools: and IDE: Eclipse, NetBeans, Maven, DB Visualizer, SQL Server Management Studio
Version Control Tools: SVN, GIT, GITHUB, TFS, CVS and IBM Rational Clear Case
Confidential - Wayne, PA
Sr. Big Data Developer
- Responsible for manage data coming from different sources. Storage and Processing in Hue covering all Hadoop ecosystem components.
- Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
- Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
- Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
- Created Hive tables as per requirement as internal or external tables, intended for efficiency.
- Developed MapReduce programs for the files generated by hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Created tables in HBase to store variable data formats of PII data coming from different portfolios
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
- Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Worked with the Apache Nifi flow to perform the conversion of Raw data into ORC.
- Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
- Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
- Experienced in pulling the data from Amazon S3 bucket to Data Lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
- Involved unit testing, interface testing, system testing and user acceptance testing of the workflow tool.
- Used JIRA for bug tracking and GIT for version control.
- Involved in the high-level design of the Hadoop architecture for the existing data structure and Business process
- Extensively worked on creating an End-End data pipeline orchestration using Nifi
- Developed scalable data pipelines to process data from multiple sources in real time using Kafka, Nifi and Spark streaming.
- Part of Configuring & deployment of Hadoop Cluster in the AWS cloud.
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
- Involved in loading disparate datasets into Hadoop Data Lake, this would be available to the data science team to predict the future.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
- Worked with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.
- In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in loading data from UNIX file system to HDFS using Flume and HDFS API.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
Environment: Apache Hadoop 3.0, AWS, MLlib, MYSQL, Kafka, HDFS 1.2, Hive 2.3, Pig 0.17, MapReduce, Flume 1.8, Cloudera, Oozie, UNIX, Oracle 12c, Tableau 7, GIT, UNIX.
Confidential - Troy, NY
Sr. Big data/Hadoop Developer
- Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and SPARK with SCALA.
- Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's written in Java or Python
- Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS
- Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
- Developed Shell and Python scripts to automate and provide Control flow to Pig scripts.
- Worked on designing NoSQL Schemas on HBase.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, MongoDB, Cassandra, HBase, Teradata, Netezza and also log data from servers.
- Designed Data flow to pull the data from Rest API using Apache Nifi with SSL context configuration enabled.
- Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Experienced on MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output.
- Used the Teradata fast load/Multi load utilities to load data into tables.
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
- Used HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Automated workflows using shell scripts and Control-M jobs to pulldatafrom various databases into HadoopDataLake.
- Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
- Performed data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Vault.
- Used AWS to produce comprehensive architecture strategy for environment mapping.
- Implemented Spark RDD transformations, actions to migrate MapReduce algorithms
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
- Involved in developing Stored Procedures for fetching data from Greenplum and created workflow using Apache Nifi.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
- Wrote extensive Map reduce jobs in java to train the cluster and developed Java map reduce programs for the analysis of sample log files stored in cluster.
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
Environment: Hadoop 3.0, AWS, HDFS, Pig, Hive 2.3, MapReduce, AWS S3, Scala 2.1, Sqoop, SparkSQL, Spark Streaming, Spark LINUX, Teradata 14, Oracle 11g, Java, Python.
Confidential - Hartford, CT
Sr. Java/Hadoop Developer
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Extracted files from Cassandra through Sqoop and placed them in HDFS and processed them.
- Performed data modeling to connect data stored in Cassandra DB to the data processing layers and wrote queries in CQL.
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC) using Agile software development methodology.
- Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
- Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
- Implemented Model View Controller (MVC) architecture using Spring Framework.
- Worked onJavaBeans and other business components for the application and implemented new functionalities for the ERIC application.
- Developed various SQL queries and PL/SQL Procedures in Oracle db for the Application
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs injava fordatacleaning and preprocessing.
- Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
- Developed integration services using SOA, Mule ESB, Web Services, SOAP, and WSDL.
- Actively involved in designing and implementing Singleton, MVC, and Front Controller and DAO design patterns.
- Used log4j to log the messages in the database.
- Performed unit testing using JUNIT framework.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Used Hibernate to access the database and mapped different POJO classes to the database tables and persist the data into the database.
- Used Spring Dependency Injection to set up dependencies between the objects.
- Developed Spring-Hibernate and struts integration modules.
- Developed Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting
- Implemented scripts for loading data from UNIX file system to HDFS.
- Integrated Struts application with Spring Framework by configuring Deployment descriptor file and application context file in Spring Framework.
Confidential - Albany, NY
Sr. Java/J2EE Developer
- Analysis and understanding of business requirements and implement the process using Agile (Scrum) methodology
- Followed Test driven development of Agile Methodology to produce high quality software.
- Developed the J2EE application based on the Service Oriented Architecture
- Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
- Used Hibernate to access Oracle database for accessing customer information in this application.
- Used Maven script to create WAR and EAR files to work on Defects/Bug fixes as per Weekly Sprint Planning
- Worked on developing the REST web services and integrating with them from the front-end.
- Designed and developed the communication tier to exchange data through JMS & XML over HTTP.
- Used Object-oriented development techniques such as UML for designing Use case, Sequence, Activity and Class and Object diagrams.
- Used Hibernate as ORM tool to store the persistence data into the MySQL database.
- Developed application using Spring MVC, JSTL (Tag Libraries) and AJAX on the presentation layer, the business layer is built using spring and the persistent layer uses Hibernate.
- Developed Web services for consuming Stock details and Transaction rates using JAX-WS and Web services Template.
- Developed PL/SQL stored procedures and extensively used HQL.
- Used Spring to develop light weight business component and Core Spring framework for Dependency injection.
- Developed the project using Waterfall methodologies and Test Driven Development.
- Code review with the Clients using SmartBear tool.
- Configured different layer (presentation layer, server layer, persistence layer) of application using Spring IOC and maintained the Spring Application Framework's IOC container.
- Implemented Java classes to read data from XLS and CSV Files and to store the data in backend tables using Web Frame APIS.
- Configured faces-config.xml and navigation.xml to set all page navigations and created EJB Message Driven Beans to use asynchronous service to perform profile additions.
- Used various CoreJavaconcepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a WebLogic Application server.
- Used CVS as version control system for the source code and project documents.
- Designed and developed a Batch process to for VAT.
- Followed Test Driven Development (TDD), Scrum concepts of the Agile Methodology to produce high Quality Software.
- Actively participated in development of user interfaces and deploying using web logic Application server.
- Implemented MVC architecture to develop web application using Struts framework.
- Involved in Designing the Database Schema and writing the complex SQL queries.
- Participated in the design and development of database schema and Entity-Relationship diagrams of the backend Oracle database tables for the application.
- Involved in the development of backend Logics or data access logic using Oracle DB & JDBC.
- Extensively developed stored procedures, triggers, functions and packages in oracle SQL, PL/SQL.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies
- Developed EJB for processing the Business logics and to provide data persistence in the application
- Responsible for developing Use Case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
- Worked on eclipse with Tomcat Apache for development.
- Used SOAP (Simple Object Access Protocol) for web service by exchanging XML data between the applications.
- Implemented Singleton, Factory design pattern, DAO Design Patterns based on the application requirements.
- Designed and developed the communication tier to exchange data to Xpress Services through JMS & XML over HTTP.
- Developed Unit test cases using JUnit and Mock Objects
- Modifying and migrating existing applications for fine-tuning and performance improvements
- Developed the web interface using MVC design pattern with Struts framework.