- Overall 6+ years of IT experience as Big Data Developer using Hadoop, HDFS, Hortonworks, MapReduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala ), Java and J2EE.
- Experience in creating web - based applications using JSP and Servlets.
- Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
- Extensively worked with object oriented Analysis, Design and development of software using UML methodology.
- Knowledge on Spark and its in-memory capabilities, mainly in framework exploration for transition from MapReduce to Spark.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Experience in working with No-SQL database like MongoDB, Cassandra, and HBase.
- Experience in loading data into Spark schema RDD's and querying them using Spark-SQL.
- Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Hands on experience in setting up databases using RDS, storage using S3 bucket and configuring instance backups to S3 bucket to ensure fault tolerance and high availability.
- Experience in using various Hadoop Distributions like Cloudera, Hortonworks and Amazon EMR.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Expertise in using J2EE application servers such as IBM WebSphere, JBoss and web servers like Apache Tomcat.
- Experience in using ANT and Maven for building and deploying the projects in servers and using JUnit and log4j for debugging.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Working experience on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, Pig, and MapReduce.
- Experience in analyzing data using HiveQL, and custom MapReduce programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
- Proficient in development methodologies such as Scrum, Agile, and Waterfall.
- Strong knowledge on Hadoop eco-systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Experience in Object Oriented language like Java and Core Java.
Big Data Ecosystem: MapReduce MRv2, HDFS, HIVE 2.3, HBase 1.2 Pig 0.17, Sqoop 1.4.7, Apache Flume 1.8, HDP, Oozie 4.3, Zookeeper 3.4, Spark 2.3, Kafka 2.0, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Platform: Amazon AWS, EC2, Redshift
Databases: Oracle 12c, MySQL, MS-SQL Server 2017/2016
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
NoSQL Databases:: HBase and MongoDB
Programming Languages: Java 8, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, Intellij, Toad and NetBeans.
Confidential - Shelton, CT
Sr. Big Data/Hadoop Developer
- As a Big Data/Hadoop Developer, I worked on Hadoop eco-systems including Hive, HBase, Oozie, Pig, Zookeeper, Spark Streaming MCS (MapR Control System) and so on with MapR distribution.
- Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, Impala and loaded final data into HDFS.
- Experienced in designing and developing POC's in Spark using Scala to compare the performance of Spark.
- Managed and supported of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
- Imported data using Sqoop to load data from Oracle to HDFS on regular basis or from Oracle server to HBase depending on requirements.
- Worked on Apache Solr which is used as indexing and search engine.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Exported the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Used RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in loading data from UNIX file system to HDFS. Involved in designing schema, writing CQL's and loading data using Cassandra.
- Built the automated build and deployment framework using Jenkins, Maven etc.
Environment: Spark 2.3, Hive 2.3, Pig 0.17, SQL, Sqoop 2.0, Kafka 2.0, Scala 2.12, Apache Flume 1.8, Cassandra 3.11, Hortonworks
Confidential - Chicago, IL
Big Data/Hadoop Developer
- Worked as a Big Data/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Involved in moving log files generated from varied sources to HDFS, further processing through Flume.
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Used Apache Spark on Yarn to have fast large scale data processing and to increase performance.
- Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
- Worked on Cluster co-ordination services through Zookeeper.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.
- Exported analyzed data to relational databases using Sqoop in deploying data from various sources into HDFS and building reports using Tableau.
- Exported analyzed data to relational database using Sqoop for visualization to generate reports for the BI team.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Hadoop 3.0, Kafka 2.0.0, Hive 2.3, MVC, Scala 2.12, JDBC, Sqoop 2.0, Zookeeper 3.4, Spark 2.3, HDFS, EC2, MySQL, Agile.
Confidential - Bothell, WA
Sr. Java/Hadoop Developer
- Responsible for ingesting large volumes of data into Hadoop Data Lake Pipeline on daily basis.
- Developed Spark programs to Transform and analyze the data.
- Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQl Server.
- Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
- Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
- Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
- Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Implemented java J2EE technologies on the server side like Servlets, JSP and JSTL.
- Developed JSP Pages and implemented AJAX in them for a responsive User Interface.
- Involved in developing presentation layer using JSP and Model layer using EJB Session Beans.
- Implemented Unit test cases by using JUnit and Implemented Log4J for logging and debugging the application.
- Implemented Maven Build Scripts for building the application.
- Deployed the application in IBM Web Sphere and tested for and server related issues.
- Used Git as the repository and for Version Control. Used Intellij as the IDE for the development.
Environment: Java 7, PL/SQL, SOA, MVC, HDFS, JSP, HTML5, Shell Script, Hibernate 5.0.0, Apache Tomcat.
Sr. Java/J2EE Developer
- Responsible for designing, coding and developed the application in J2EE using MVC architecture.
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
- Actively involved in the application architecture and development tools for web solutions that fulfill the business requirements of the project.
- Developed application using Spring Framework that leverages classical Model View Controller (MVC) architecture, and Hibernate as the ORM.
- Used Eclipse as the Java IDE in the development of the application.
- Developed Use case diagrams, Object diagrams, Class diagrams, and Sequence diagrams using UML
- AngularJS was used to binding information between elements of the pages and for routing of the WebPages.
- Created BI Controllers based Java classes working together with XML transformation layer, to transform data received from the data providers.
- Validated if existing web services can be reusable to support new UI functionality, and created Spring boot services for processing scheduled or one time or stored payment functionalities.
- Worked on basic authentication in both Java Spring Boot and IIB, for implementing security between front end UI and backend SOA services (Java Spring boot & IIB), using base encoded authentication string.
- Developed Java classes for implementing asynchronous processing using Web logic.
- Involved in creation and deployment of Enterprise Application in Web Logic.
- Employed Hibernate to store the persistent data as an Object-Relational Mapping (ORM) too for communicating with database.
- Used Web services for sending and getting data from different applications using REST.
- Developed JSP pages using Custom tags and Tiles framework and Struts framework.
- Deployed the complete Web applications in WebSphere Application server.
- Developed Shell Scripts UNIX and Perl programs for data integrity with Mainframe
- Used ANT tool for building and packaging the application.
- Used Log4j to capture the log that includes runtime exception and for logging info and are helpful in debugging the issues.