- Above 7+ years of experience as a Lead. Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
- Overall 3+ years of working experience on HDFS, MapReduce and Hadoop Ecosystem Pig, Hive, Flume, Kafka, YARN, Oozie, HBase, Zookeeper and Sqoop.
- Well versed with developing and implementing of MapReduce jobs using Hive and Pig to work on Big Data.
- Experience on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs.
- Good knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
- Experience in writing custom UDFs for extending Hive and Pig core functionalities.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems such as Oracle, MS, and MySql and vice versa.
- Working on different file formats such as Avro, JSON, XML and Parquet.
- Worked on implementing ETL methodologies using SSIS, and Talend, and Knowledge on Informatica Power Center.
- Extensive experience as a DWH Technical Analyst, provided Business Intelligence and Data Warehouse solutions using Microsoft Data Tools, IBM Cognos FM, and BI reports (Cognos Report studio, Tableau, Power BI).
- Experienced in Data modeling knowledge of Ralph Kimball’s Dimensional Data Modeling using Star Schema and Snow-Flake Schema.
- Worked on IBM Cognos Framework Manager, Query Studio, Report Studio, work space Advance and Cognos Administration.
- Experienced on various RDBMS like MySQL, Oracle, and MS Sql Server.
- Extensive experience with SQL in designing data base objects such as Tables, functions, procedures, triggers, Views and CTE’s.
- Experienced in Erwin data Modeling iterations (Schematic/ Logical/ Physical) Forward and reverse-engineering process.
- Experienced in Project Management, Change Management, Vendor Management, Agile Scrum master, Business analyst and Data Analyst, and knowledge on SDLC and STLC.
- Experience of import/export data using Sqoop from Hadoop distributed file systems to relational database systems and vice versa.
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- Imported the data from source HDFS into Spark Data Frame for in-memory data computation to generate the optimized output response and better visualizations.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frame.
- Implemented POC for using Impala for data processing on top of HIVE for better utilization of C++ executions engines.
- Experience in NoSQL Databases HBase, Cassandra and it's integrated with Hadoop cluster.
- Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase limitations.
- Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and SQL Server.
Big Data Technologies: Hadoop, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Storm, Oozie, Flume.
Frame Works: Spring, Struts, Hibernate 4.3, EJB 2.0 / 3.0, Web Services, SOAP, Restful, JMS.
Java Technologies: Java, J2EE, JDBC, Servlets, JSP, JavaBeans, EJB, JPA, JMS, Web Services.
Spark components: RDD, Spark SQL (Data Frames and Dataset), Spark Streaming.
Cloud Infrastructure: AWS Cloud Formation, S3, EC2-Classic and EC2-VPC and MS Azure
Programming Languages: SQL, C, C++, Java, Core Java and Python.
Databases: Oracle 12c/11g, Teradata 15/14, MySQL, SQL Server2016/2014, DB2.
XML Technologies: DTD, XSD, XML, XSL, XSLT, XQuery, SAX, DOM, JAXP.
Version Control: CVS, SVN and Clear Case.
Methodologies:: Agile, RAD, JAD, RUP, Waterfall & Scrum
Operating Systems: Windows, UNIX/Linux
Application/Web Servers: WebSphere, Web Logic, Apache Airflow, Tomcat, JBOSS.
Build Tools: Eclipse, ANT 1.7, Maven, NetBeans, IBM Rational Application Developer
Confidential, Columbus, OH
Sr. Hadoop/Lead Big Data Developer
- Involved in analysis, design and development phases of the project. Adopted Agile methodology throughout all the phases of the application.
- Implemented Reporting, Notification services using AWS API and used AWS (Amazon Web services) compute servers extensively.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System using Oozie Workflow Scheduler.
- Designed AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Implemented Map Reduce jobs using Java API and Python using Spark
- Participated in the setup and deployment of Hadoop cluster
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Utilized AWS services with focus on big data Architect/analytics/enterprise data warehouse and business intelligence solutions
- Provide support data analysts in running Pig and Hive queries.
- Involved in HiveQL and Involved in Pig Latin.
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Configured HA cluster for both Manual failover and Automatic failover.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Experience in writing SOLR queries for various search documents
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Performed exceptional J2EE Software Development Life Cycle (SDLC) of the application in Web and client-server environment using J2EE.
- Gathered and analyzed the requirements and designed class diagrams, sequence diagrams using UML.
- Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
- Used Spark to create API's in JAVA and Scala and real time streaming the data using Spark with Kafka.
- Worked on analyzing Hadoop cluster using different Big data analytic tools including Kafka, Sqoop, Storm, Spark, Pig, Hive and Map Reduce.
- Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbase and further to develop reports in Tableau.
- Importing the unstructured data into the HDFS using Flume.
- Performed Hadoop installation, configuration of multiple nodes in AWS-EC2 using Hortonworks platform.
Environment: Hadoop 3.0, MapReduce, JAVA SE 9, Spark, Storm, Kafka, Flume 1.8, HDFS, AWS, Hortonworks, UNIX, HIVE 2.3, PIG 0.17, Tableau, HBase, Sqoop, Oracle 12c, MySQL, SQL, Sqoop, Cassandra 3.11, UDFs, and Zookeeper 3.4
Confidential, Durham, NC
Lead. Big Data Developer
- Involved in Agile development methodology active member in scrum meetings.
- Involved in installation and configuration of Hadoop Ecosystem components with Hadoop Admin.
- Improved application performance using Azure Search and SQL query optimization.
- Involved in Agile Methodologies of Project Development
- Kafka topics by writing Spark programs in java and Scala.
- Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB
- Worked with different file formats like Sequence files, XML files and Flat files using Map Reduce Programs.
- Imported data from main frames to the Hadoop Ecosystems using Flume.
- Implemented multiple Map Reduce programs in Java for Data Analysis.
- Involved in loading data from Unix file system to HDFS and responsible for writing generic scripts in Unix.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Imported and Exported data from Hadoop Ecosystems such as HIVE, PIG and HBase using Sqoop
- Created Azure Web Application projects, updated and deployed Web apps, Web Jobs using visual studio, Github and Azure Resource Manager.
- Involved in importing and exporting data between HDFS and Relational Database Systems like Oracle, MySQL and SQL Server using Sqoop.
- Involved in ingesting data into Cassandra and consuming the ingested data from Cassandra to HDFS.
- Created Hive, PIG, SQL AND HBase tables to load large sets of structured, semi-structured and unstructured data coming from Unix, NoSQL and a variety of portfolios.
- Implemented Partitioning, Dynamic Partitions and bucketing in Hive for efficient data access.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
- Implementing Sqoop jobs, PIG and Hive scripts for data ingestion from relational databases to compare with historical data.
- Experienced in optimizing Hive Join Queries to have better results for Hive ad-hoc queries.
- Configured Shared Access Signature (SAS) tokens and Storage Access Policies in Azure Cloud Infrastructure.
- Involved in executing Oozie workflow to automate parallelization of Hadoop Map Reduce jobs.
- Involved in writing Hive UDFs for complex operation where HIVE queries don’t fit.
- Involved in loading and transforming of large sets of structured, semi structured and unstructured data.
- Used Hue browser for interacting with Hadoop components.
- Experience in Cluster coordination services through Zookeeper.
- Involved in managing and reviewing Hadoop log files.
- Involved in executing Pig scripts, Hive Queries for optimal data driven solutions.
- Worked with Spark streaming to handle the continual flow of data analysis with less time latency.
- Worked with Spark MLlib to test the sample data for recommendation systems and fraud detections.
- Creating technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Using Spark to create API's in Scala for Big data analysis.
- Exporting the result set from Hive to SQL using Sqoop.
- Using hive data warehouse modeling to interface with BI tools such as Tableau from Hadoop also, enhance the existing applications.
Environment: Hadoop 3.0, MapReduce, Flume 1.8, Java SE 11, HDFS, UNIX, HIVE 2.3, PIG 0.17, Oracle 12c, MySQL, SQL, Sqoop, Cassandra 3.11, HBase, Azure, UDFs and Zookeeper 3.4
Confidential, Dallas, TX
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Used Hive to analyze data ingested in to the HBase by using Hive-HBase integration and computes various metrics for reporting on the dashboard.
- Loaded the aggregated data onto the oracle from Hadoop environment using Sqoop for reporting on the dashboard.
- Involved in installing, configuring and maintaining the Hadoop cluster including YARN configuration using Cloudera, Hortonworks.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Created and managed in database schema, common frameworks. XML schemas, APLs.
- Developed MVC design pattern based User Interface using JSP, XML, HTML4, CSS2 and Struts.
- Used Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO).
- Developed window layouts and screen flows using Struts Tiles.
- Developed structured, efficient and error free codes for Big Data requirements. Storing, processing and analyzing huge dataset for getting valuable insights from them.
- Implemented application specific exception handling and logging framework using Log4j
- Used JDBC to connect to database and wrote SQL queries and stored procedures to fetch and insert/update to database tables.
- Applied machine learning principles for studying market behavior for trading platform.
- Used Maven as the build tool and Tortoise SVN as the Source version controller.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Excessive work in writing SQL Queries, Stored procedures, Triggers using TOAD.
- Code development using core java concepts to provide service and persistence layers. Used JDBC to provide connectivity layer to the Oracle database for data transaction.
- Implemented core java concepts like interfaces, collection framework, used Array List, Map and Sets of Collection API.
- Real time streaming the data using Spark with Kafka.
- Developed Entity Beans as Bean Managed Persistence Entity Beans and used JDBC to connect to backend database DB2.
- Used SOAP-UI for testing the Web-Services.
- Performed software development/enhancement using IBM Rational Application Developer (RAD)
- Integrated with the back-end code (Web services) using Jquery, JSON and AJAX to get and post the data to backend servers.
- Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
- Worked with complicated queries in Cassandra
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Developed various data connections from data source to Tableau Server for report and dashboard development.
- Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
- Used apache-maven tool to build, Config, and package and deploy an application project.
- Developed complex data representation for the adjustment claims using JSF Data Tables.
- Performed version control using PVCS.
- Used JAX-RPC Web Services using SOAP to process the application for the customer
- Used various tools in the project including Ant build scripts, Junit for unit testing, Clear case for source code version control, IBM Rational Doors for requirements, HP Quality Center for defect tracking.
Environment: Java 6, Oracle 11g, Hadoop2.8, Hive, HBase, HDFS, Kafka, Hive, SQL Server 2014, MapReduce, AJAX, JQUERY,, EJB, JAXB, JAXRS, JDBC, Hibernate 4.x, RAD 8.5, Eclipse 4.x, Apache POI, HTML5, XML, CSS3, Java Script, Apache Server, Apache Airflow, PL/SQL, CVS.
Confidential, Edison, NJ
Sr. Java/J2EE Developer
- Used Spring MVC architecture and Hibernate ORM to map the Java classes and Oracle 9i, SQL server 2005 database.
- Deployed and tested the application in UNIX on JBoss Application Server.
- Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
- Extensively used JSF framework in creation of front-end UI development.
- Designed, coded and maintained application components based on detailed design specifications solutions to meet user requirements and structured the application components using EXT JS 3.5
- Created use case diagrams, class diagrams, sequence and activity diagrams using Visio to model and implement the tool.
- Implemented different types of spring controllers as per application requirements, Spring validators, persistence layer, DAO and service layer components using Spring/Hibernate API and Spring/Hibernate annotations. Extensively used Hibernate QL.
- Extensively used Spring IOC, configured Application Context files and performed database object mapping using Hibernate annotations.
- Developed and implemented the DAO design pattern including JSP, Servlets, Form Beans and DAO classes and other Java APIs.
- Integrated usage of CRUD operations, performed Inner/Outer joins, used stored procedures, stored functions and cursors using Oracle PL/SQL.
- Extensively involved in database designing work with Oracle 9i Database and building the application in J2EE Architecture. .
- Extensively used Cairngorm Micro Architecture 2.2 MVC Framework to build the Rich User Interface for the chat application.
- Implemented AJAX with prototype JS framework
- Implemented Log4j for logging
- Used Log4j to log events, exceptions and errors in the application to serve for debugging purpose
- Developed complex data representation for the adjustment claims using JSF Data Tables.
- Developed code using Core Java to implement technical enhancement following Java Standards
- Developed JavaBeans for implementing business logic, and for mapping objects to tables in My SQL database using Hibernate
- Created an application in UNIX and windows platforms , involved in shell script or UNIX operating system.
- Used Java Mail API for sending emails.
- Used Maven build tool to build and deploy the application on WebSphere 7.0
- Designed and developed Utility Class that consumed the messages from the Java message Queue and generated emails to be sent to the customers.
- Responsibilities also included analyzing business requirements, supporting QA and UAT testing and code review.
- Provided production support by debugging and fixing critical issues related to application and database.
Confidential, Greensboro, NC
- Performed Requirements gathering, Analysis, Design, Code development, Testing using SDLC(Waterfall) methodologies.
- Wrote web service client for tracking operations for the orders which is accessing web services API and utilizing in our web application.
- Implemented data archiving and persistence of report generation meta-data using Hibernate by creating Mapping files, POJO classes and configuring hibernate to set up the data sources.
- Developed Spring framework DAO Layer with JPA and EJB3 in Imaging Data model and Doc Import.
- The business logic is developed using J2EE framework and deployed components on Application server where Eclipse was used for component building.
- Actively involved in deployment EJB service jars, Application war files in WebLogic Application server.
- Developed GUI screens for login, registration, edit account, forgot password and change password using Struts 2.
- Used JUnit framework for unit testing of application and JUL logging to capture the log that includes runtime exceptions
- Writing SQL queries for data access and manipulation using Oracle SQL Developer.
- Developed Session Bean to encapsulate the business logic and Model and DAO classes using Hibernate
- Designed and coded JAX-WS based Web Services used to access external financial information.
- Implemented EJB Components using State less Session Bean and State full session beans.
- Used spring framework with the help of Spring Configuration files to create the beans needed and injected dependency using Dependency Injection.
- Utilized JPA for Object/Relational Mapping purposes for transparent persistence onto the Oracle database.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used SOAP as a XML-based protocol for web service operation invocation.
- Packaged and deployed the application in IBM WebSphere Application server in different environments like Development, testing etc.
- Used Log4J to validate functionalities and JUnit for unit testing.