- Having 8+ Years of experience in the field of Information Technology which includes a major concentration on Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programming language and J2EE technologies with highly recommended software practices.
- Having 4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, MapReduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
- Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR and Amazon Elastic Compute Cloud (Amazon EC2).
- Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
- Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom MapReduce programs in Java.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD in Scala and Python.
- Expertise in working with the Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
- Hands on experience in writing MapReduce programs using Java to handle different data sets using Map and Reduce tasks.
- Worked with join patterns and implemented Map side joins and Reduce side joins using MapReduce.
- Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (HBase).
- Dealt with huge transaction volumes while interfacing the front-end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
- Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3, Dynamo DB, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
NoSQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, Web Logic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
Confidential, Bloomington, IL
Sr. Hadoop Developer
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
- Experience in installing, configuring Hadoop cluster for major Hadoop distributions.
- Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
- Experienced in writing the Map Reduce programs for analyzing of data as per the business requirements.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & SPLUNK and process the files by using Piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating Hive queries.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Involved in developing Hive UDFs and reused in some other requirements.
- Worked on performing Join operations.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data and transforming into data sets of meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Worked on data analytics using Pig and Hive on Hadoop.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Experienced in writing Pig scripts and Pig UDFs to pre-process the data for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Reporting Expertise through Talend.
- Used JSTL and built custom tags whenever necessary.
- Used Expression Language to tie beans to UI components.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hive, Pig, MapReduce, AVRO, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Python, SQL, Hadoop 1.x, HDFS, Talend, Pig, Hive, HBase, Github, MapReduce, Java, Sqoop, Flume, Splunk, Oozie, Linux, UNIX Shell & Python Scripting.
Sr. Hadoop Developer
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked within and across Agile teams to design, develop, test and support technical solutions across a full-stack of development tools and technologies.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex Map reduce Jobs using Hive and Pig.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience with Cassandra, with ability to drive the evaluation and potential implementation of it as a new platform. Implemented analytical engines that pull data from API data sources and then present data back as either an API or persist it back into a NoSQL platform.
- Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Implemented a distributed messaging queue to integrate with Cassandra using Zookeeper
- Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
- Design, implementation, test, debug of ETL mappings and workflows.
- Develop ETL routines to source data from client source systems and target the data warehouse.
- Developed Product Catalog and Reporting DataMart databases with supporting ETLs.
- Implemented ETL processes for warehouse and designed and implemented code for migrating data to Data lake using Spark.
- The data is collected from distributed sources into Avro models . Applied transformations and standardizations and loaded into Hive for further data processing.
- Built Platfora Hadoop multi-node cluster test labs using Hadoop Distros (CDH 4/5, Apache Hadoop, MapR and Horton Works) and Hadoop Eco-systems, Virtualizations and Amazon Web Services component.
- Installed, Upgraded and Maintained Cloudera Hadoop-based software.
- Experience with hardening Cloudera Clusters, Cloudera Navigator and Cloudera Search.
- Managing Running Jobs, Scheduling Hadoop Jobs, Configuring the Fair Scheduler, Impala Query Scheduling.
- Extensively worked on Impala to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Extensively Used Impala to read, write and query the Hadoop data in HDFS. Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Used struts validation framework for form level validation
- Wrote test cases in Junit for unit testing of classes.
Confidential, Cincinnati, OH
- Developed solutions to process data into HDFS , process within Hadoop and emit the summary results from Hadoop to downstream systems.
- Installed and configured Hadoop MapReduce , developed multiple MapReduce jobs for cleansing and preprocessing.
- Worked and written Hadoop MapReduce jobs to run on Amazon EMR clusters and creating workflows for running jobs.
- Worked on Sqoop extensively to ingest data from various source systems into HDFS.
- Imported data from different relational data sources like Oracle , MySQL to HDFS using Sqoop .
- Analyzed substantial data steps using Hive queries and Pig scripts.
- Written Pig scripts for sorting, joining, and grouping data.
- Integrated multiple sources of data (SQL Server, DB2, MySQL) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
- Played a major role in working with the team to leverage Sqoop for extracting data from Oracle.
- Solved small file problem using Sequence files processing in MapReduce.
- Implemented counters on HBase data to count total records on different tables.
- Created HBase tables to store variable data formats coming from different portfolios. Performed real time analytics on HBase using Java API and Rest API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Experienced with different scripting language like Python and shell scripts
- Oozie and were used to automate the flow of jobs and coordination in the cluster respectively.
- Worked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar files (RC).
- Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
- Experienced with working on Avro Data files using Avro Serialization system.
- Kerberos security was implemented to safeguard the cluster.
Environment: HDFS, Pig, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, Python, MySQL, Oracle, Shell Scripting, Kerberos, EMR, Oozie, Zookeeper, EMR, SQL Server, DB2, MySQL.
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Integrated Spring Dependency Injection among different layers of an application with spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Implemented the Connectivity to the Database Server Using JDBC.
- Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
- Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
- Configured domains in production, development and testing environments using configuration wizard.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.
- Involved in analysis, design and development of e-bill payment system as well as account transfer system and developed specs that include Use Cases.
- Class Diagrams, Sequence Diagrams and Activity Diagrams.
- Involved in designing the user interfaces using JSPs.
- Developed custom tags, JSTL to support custom User Interfaces.
- Developed the application using Struts Framework using Model View Layer (MVC) architecture.
- Implemented persistence layer using Hibernate that use the POJOs to represent the persistence database tables. These POJOs are serialized Java Classes that would not have the business processes.
- Implemented Hibernate using the Spring Framework (Created the session Factory).
- Implemented the application using the concrete principles laid down by several design patterns such as MVC, Business Delegate, Data Access Object, Singleton and Factory.
- Deployed the applications on BEA Web Logic Application Server.
- Developed JUnit test cases for all the developed modules.
- Used CVS for version control across common source code used by developers.
- Used Log4J to capture the log that includes runtime exceptions.
- Used JDBC to invoke Stored Procedures and database connectivity to ORACLE.
- Refactored the code to migrate from Hibernate2.x version to Hibernate3.x. (I.e. moved from xml mapping to annotations) and Implemented the Hibernate Filters and Hibernate validators.
- DAO and the hibernate transactions was implemented using spring framework.
Environment: Java, J2EE, JSP, JNDI, Oracle 10g, DHTML, ANT, Rationale Rose, Eclipse 3.1, Unix, Web logic Application Server, Hibernate 3.0, Struts, LOG4J, CVS.