- Above 10+ working experience as a Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
- Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
- Expertise in ingesting real time/near real time data using Flume, Kafka, Storm
- Good knowledge of NO SQL databases like Mongo DB, Cassandra and HBase.
- Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig and SOLR, Splunk.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Apache Crunch, Zookeeper, Scoop, Hue, Scala, AVRO.
- Extensive experience in SOA-based solutions - Web Services, Web API, WCF, SOAP including Restful APIs services
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Expertise in developing a simple web-based application using J2EE technologies like JSP, Servlets, and JDBC.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Work Extensively in Core Java, Struts2, JSF2.2, Spring3.1, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
- Well versed working with Relational Database Management Systems as Oracle 12c, MS SQL, MySQL Server2016.
- Hands on experience in working on XML suite of technologies like XML, XSL, XSLT, DTD, XML Schema, SAX, DOM, JAXB.
- Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, Spark, R and Spark Streaming), Kafka and Predictive analytics
- Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
- Experienced on applications using Java, python and UNIX shell scripting
- Experience in consuming Web services with Apache Axis using JAX-RS(REST) API's.
- Experienced in building tool Maven, ANT and logging tool Log4J.
- Experience in working with Web Servers like Apache Tomcat and Application Servers like IBM Web Sphere and JBOSS.
- Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
- Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
Hadoop Ecosystem: Hadoop 3.0, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Spark Sql, Spark streaming
Big Data Technologies:: Hadoop 3.0, HDFS1.2.4, Map Reduce, Hbase 1.2.6, Pig, Hive, Flume, Impala, Oozie, Spark, Yarn
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lak, Data Factory
Build Management Tools: Maven, Apache Ant
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX, shell scripting, J2EE, R 3.4, Python, XPath
Frameworks:: MVC, Spring, Hibernate, Struts, EJB, JMS, JUnit, MR-Unit
Databases:: Oracle12c/11g, MYSQL, DB2, MS SQL Server 2016/2014
Java Tools & Web Technologies:: EJB, JSF, Servlets, JSP, JSTL, CSS3/2, HTML5/4, XHTML, CSS, XML, XSL, XSLT
Tools and IDE:: SVN, Maven, Gradle, Eclipse 4.6, NetBeans 8.2
Open Source:: Hibernate, Spring IOC, Spring MVC, Spring Web Flow, Spring AOP
Methodologies:: Agile, RAD, JAD, RUP, Waterfall & Scrum
Confidential, Atlanta, GA
Sr. Big Data Developer
- Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Used Talend for Big data Integration using Spark and Hadoop.
- Used Microsoft Windows server and authenticated client server relationship via Kerbros protocol.
- Experience on BI reporting with At Scale OLAP for Big Data.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Designed and Developed Real time Stream processing Application using Pig and Hive to perform Streaming ETL and apply Machine Learning.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Performed data profiling and transformation on the raw data using Pig and Python.
- Experienced with batch processing of data sources using Apache Spark.
- Developing predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF).
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Implement enterprise grade platform(mark logic) for ETL from mainframe to NOSQL(cassandra).
- Experience on BI reporting with At Scale OLAP for Big Data.
- Responsible for importing log files from various sources into HDFS using Flume.
- Assigned name to each of the columns using case class option in Scala.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Experience in integrating oozie logs to kibana dashboard.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
Environment: Hadoop, Hive, Linux, MapReduce, HDFS, Hive, Pig, Sqoop, Shell Scripting, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Linux, JIRA 5.1, CVS, JIRA 5.2.
Confidential, Ann Arbor, MI
Sr. Big Data Developer
- Involved in Agile development methodology active member in scrum meetings.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
- Evaluated Composite for data fabric virtualization approach; evaluating industry data model approach
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
- Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users and data scientists.
- Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and MLlib.
- Installed Hadoop, Map Reduce, HDFS, Azure to develop multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a Spark job in Java which indexes data into Elastic Search from external Hive tables which are in HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Performed transformations like event joins, filter boot traffic and some pre-aggregations using Pig.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Support Cloud Strategy team to integrate analytical capabilities into an overall cloud architecture and business case development.
Environment: Azure, Hadoop 3.0, Sqoop 1.4.6, PIG, Hive, MapReduce, Data-Fabric, Spark 2.2.1, Shells scripts, SQL, Hortonworks, Python, MLlib, HDFS, YARN, Java, Kafka 1.0, Cassandra, Oozie
Confidential, Hartford, CT
Sr. Big Data Developer
- Involved in analysis, design and development phases of the project. Adopted agile methodology throughout all the phases of the application.
- Performed Hadoop installation, configuration of multiple nodes in AWS-EC2 using Hortonworks platform.
- Designed and led the implementation of core system components: predictive caching with off-heap Chronicle maps and Apache Ignite in-memory data fabric, Kafka.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
- Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
- Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
- The roadmap includes moving the division from a heavily man-hour intensive code based environment into a more design centric and design based environment leveraging the capabilities of Talend Data Fabric Big Data Edition, Apache NiFi, Kafka and a few other tool set. Installed and configured Hortonworks HDF/NiFi for POC, and later migrated it into production with link to the Azure Data Lake (ADLS)
- Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster.
- Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
- Exposure on Spark Architecture and how RDD's work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
- Involved in data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Data Integration, ETL, & Quality - Talend Data Fabric for Big Data, Kettle Pentaho, elastic. Agile - Scrumban
- Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive Metastore with MySQL, which stores the metadata for Hive tables.
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Worked on Kafka and Spark integration for real time data processing by using Kafka Producer for real time data processing by setting up Kafka mirror maker for data replication across the clusters.
- Created custom UDF's for Spark and Kafka procedure for some of non-working functionalities in custom UDF into Scala in production environment.
- Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
- Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
- Worked on major Hadoop distribution like Hortonworks numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
- Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
- Implemented Reporting, notification services using AWS API and used AWS (Amazon Web services) compute servers extensively.
Environment: Hadoop 3.0, AWS, EC2, Hortonworks, NoSQL, Hbase 1.2, HDFS 1.2, Hive, S3, Spark, RDBMS, Pig, Sqoop, MySQL, UDF, Oozie
Confidential, Newport Beach, CA
Sr. Hadoop Developer
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster by Setting up and benchmarked Hadoop clusters for internal use.
- Implemented data acquisition of Jobs using Python that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
- In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
- Created Data extract scripts and determine naming standards for schemas and tables in Hadoop DL.
- Performed data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Lake.
- Handled importing of data from various data sources, performed transformations using Hive and MapReduce for loading data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data into HDFS to perform data analytics.
- Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to HBase.
- Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS.
- Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action & Java action.
- Analyzed substantial amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hive, Pig, Sqoop, Oozie, Scala, HDFS, Spark, Hadoop, Data Lake, MapReduce, MySQL, SQL, Oracle, Kafka
Confidential, Charlotte, NC
Sr. Java/Hadoop Developer
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Used Hive to analyze data ingested in to the HBase by using Hive-HBase integration and computes various metrics for reporting on the dashboard.
- Loaded the aggregated data onto the oracle from Hadoop environment using Sqoop for reporting on the dashboard.
- Involved in installing, configuring and maintaining the Hadoop cluster including YARN configuration using Cloudera, Hortonworks.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Created and managed in database schema, common frameworks. XML schemas, APLs.
- Developed MVC design pattern based User Interface using JSP, XML, HTML4, CSS2 and Struts.
- Used Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO).
- Developed window layouts and screen flows using Struts Tiles.
- Developed structured, efficient and error free codes for Big Data requirements. Storing, processing and analyzing huge dataset for getting valuable insights from them.
- Implemented application specific exception handling and logging framework using Log4j
- Used JDBC to connect to database and wrote SQL queries and stored procedures to fetch and insert/update to database tables.
- Applied machine learning principles for studying market behavior for trading platform.
- Used Maven as the build tool and Tortoise SVN as the Source version controller.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Excessive work in writing SQL Queries, Stored procedures, Triggers using TOAD.
- Code development using core java concepts to provide service and persistence layers. Used JDBC to provide connectivity layer to the Oracle database for data transaction.
- Implemented core java concepts like interfaces, collection framework, used Array List, Map and Sets of Collection API.
- Developed Entity Beans as Bean Managed Persistence Entity Beans and used JDBC to connect to backend database DB2.
- Used SOAP-UI for testing the Web-Services.
- Performed software development/enhancement using IBM Rational Application Developer (RAD)
- Integrated with the back-end code (Web services) using JQUERY, JSON and AJAX to get and post the data to backend servers.
- Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
- Worked with complicated queries in Cassandra
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Developed various data connections from data source to Tableau Server for report and dashboard development.
- Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
- Used apache-maven tool to build, Config, and package and deploy an application project.
- Developed complex data representation for the adjustment claims using JSF Data Tables.
- Performed version control using PVCS.
- Used JAX-RPC Web Services using SOAP to process the application for the customer
- Used various tools in the project including Ant build scripts, JUnit for unit testing, clear case for source code version control, IBM Rational DOORS for requirements, HP Quality Center for defect tracking.
Environment: Java 6, Oracle 11g, Hadoop, Hive, HBase, HDFS, Hive, SQL Server 2012, MapReduce, JQUERY, JDBC, Eclipse 4.x, Apache POI, HTML4, XML, CSS/2, Java Script, Apache Server, PL/SQL, CVS.
Confidential, SFO, CA
Sr. Java/J2EE Developer
- Worked in SDLC methodology followed Waterfall environment including Acceptance Test Driven Design and Continuous Integration/Delivery.
- Responsible for analyzing, designing, developing, coordinating and deploying web based application.
- Developed the application using Spring MVC Framework that uses Model View Controller (MVC) architecture with JSP as the view.
- Used Spring MVC for the management of application flow by developing configurable handler mappings, view resolution.
- Used Spring Framework to inject the DAO and Bean objects by auto wiring the components.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on JBoss and WebSphere Application servers in both UNIX and Windows environments.
- Used Spring 3.6 Framework to integrate the application with Hibernate.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
- Used Entity beans for storing the database in to database.
- Developed Session Beans as the clients of Entity Beans to maintain the Client state.
- Used various Core Java concepts such as Multithreading, Exception Handling, Collection APIs to implement various features and enhancements.
- Used JMS messaging framework in the application to connect a variety of external systems that house member and provider data to a medical term translation application called Auto coder.
- Developed UI components and faces-config.xml file using JSF MVC Framework.
- Created POJOs in the business layer.
- Developed Ant Scripts to build and deploy EAR files on to Tomcat Server.
- Analyzed the EJB performance in terms of scalability by various Loads, Stress tests using Bean- test tool.
- Extensively used Eclipse while writing code as IDE.
- Written complex SQL queries, stored procedures, functions and triggers in PL/SQL.
- Worked on a variety of defects to stabilize Aerial application.
- Worked on Session Facade design pattern to access domain objects.
- Developed presentation layer using HTML and JSP's for user interaction.
- Used Maven to build, run and create Aerial-related JARs and WAR files among other uses.
- Wrote test cases in JUnit for unit testing of classes.
- Used AJAX to create interactive front-end GUI.
- Produced and consumed Restful web services for transferring data between different applications.
- Used integration tools like Hudson/Jenkins.
- Used Eclipse IDE for developing code modules in the development environment.
- Implemented the logging mechanism using Log4j framework.
- Developed test cases and used JUnit for Unit Testing.
- Used SVN version control to track and maintain the different version of the application.
Confidential, San Francisco, CA
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support in Waterfall methodology.
- Implemented the Struts framework based on MVC design pattern and Session Façade Pattern using Session and Entity Beans.
- Used Struts for web tier development and created Struts Action Controllers to handle the requests.
- Involved in writing the Struts-Configured files and implemented the Struts Tag library.
- Responsible for designing, coding and developed the application in J2EE using Struts MVC.
- Implemented Struts framework (Action & Controller classes) for dispatching request to appropriate classes.
- Used simple Struts Validation for validation of user input as per the business logic and initial data loading.
- Developed Restful Services and SOAP based Web Services.
- Developed Web Service provider methods (bottom up approach) using WSDL and SOAP for transferring data between the applications.
- Worked on XML technologies like XML Parsers, JAXB for binding data to java objects.
- Used Java Messaging Services (JMS) for reliable and asynchronous communication.
- Implemented the persistence layer using Hibernate and JDBC Template and developed the DAL (Data Access Layer) to store and retrieve data from the database.
- Responsible to writing JDBC programming to persist the data in My SQL database.
- Written some SQL Queries and PL/SQL procedures to fetch data from the database.
- Tested Service and data access tier using JUnit.
- Used Web Logic for application deployment and Log 4J used for Logging/debugging.
- Used CVS version controlling tool and project build tool using ANT.
- Worked with production support team in debugging and fixing various production issues.