- Having 8+ Years of experience in the field of Information Technology which includes a major concentration on Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programming language and J2EE technologies with highly recommended software practices.
- Having 4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, MapReduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
- Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR and Amazon Elastic Compute Cloud (Amazon EC2).
- Experience in working with Amazon EMR, Cloudera (CDH3&CDH4) and Hortonworks Hadoop Distributions.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs and Scala.
- Good Knowledge on experience in ETL tools such as Informatica and Datastage.
- Worked and reviewed along with Business Analysts and Functional Team members to translate Business requirements into ETL technical specifications
- Good functional experience in using various Hadoop distributions like Hortonworks, Cloudera, and EMR.
- Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
- Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom MapReduce programs in Java.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Experience with creating script for data modeling and data import and export. Extensive experienceindeploying, managing and developing MongoDB clusters.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Good Knowledge on Flink.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD in Scala and Python.
- Expertise in working with the Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
- Having Good Knowledge onHDINSIGHT.
- Hands on experience in writing MapReduce programs using Java to handle different data sets using Map and Reduce tasks.
- Worked with join patterns and implemented Map side joins and Reduce side joins using MapReduce.
- Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (HBase).
- Dealt with huge transaction volumes while interfacing the front-end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
- Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3, Dynamo DB, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache,EMR
NoSQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J,HDINSIGHT
Frameworks: Struts, spring and Hibernate,Flink
App/Web servers: WebSphere, Web Logic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
Confidential, Bloomington, IL.
Sr. Hadoop Developer
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
- Experience in installing, configuring Hadoop cluster for major Hadoop distributions.
- Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's.
- Design of Microsoft Azure HDInsight Cluster and Spark based real-time data ingestion and real-time analytics. Azure Data Factory ETL pipelines.
- Extensively worked in the performance tuning of the ETL Streaming process.
- Analyzed the business requirements and framing the business logic for the ETL Streaming process.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
- Experienced in writing the Map Reduce programs for analyzing of data as per the business requirements.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume&SPLUNK and process the files by using Piggybank.
- Good Knowledge on Flink.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating Hive queries.
- Good Knowledge in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Working on Flink in as per requirement.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Involved in developing HiveUDFs and reused in some other requirements.
- Worked on performing Join operations.
- Good exposure to EMR.
- Worked and written Hadoop MapReduce jobs to run on Amazon EMR clusters and creating workflows for running jobs.
- Implemented real time system with Kafka, Storm and Zookeeper.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data and transforming into data sets of meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Worked on data analytics using Pig and Hive on Hadoop.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Good knowledge on Kafka streams API for data transformation.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Captured the data logs from web server into HDFS using Flume &Splunk for analysis.
- Experienced in writing Pig scripts and Pig UDFs to pre-process the data for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.
- Reporting Expertise through Talend.
- Used JSTL and built custom tags whenever necessary.
- Used Expression Language to tie beans to UI components.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hive, Pig, MapReduce, AVRO, Sqoop, Oozie, Flume, Kafka,Flink, Storm, HBase, Unix, Python, SQL, Hadoop 1.x, HDFS, Talend, Scala, Pig, Hive, EMR, HDInsight,HBase, ETL, Github, MapReduce, Java, Sqoop, Flume, Splunk, Oozie, Linux, UNIX Shell & Python Scripting.
Sr. Hadoop Developer
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked within and across Agile teams to design, develop, test and support technical solutions across a full-stack of development tools and technologies.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex Map reduce Jobs using Hive and Pig.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience with Cassandra, with ability to drive the evaluation and potential implementation of it as a new platform. Implemented analytical engines that pull data from API data sources and then present data back as either an API or persist it back into a NoSQL platform.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Developed analytical component using Scala, Spark and Spark Stream.
- Involved in migration from Livelink to SharePoint using Scala through Restful web service.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented a distributed messaging queue to integrate with Cassandra using Zookeeper
- Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
- Design, implementation, test, debug of ETL mappings and workflows.
- Develop ETL routines to source data from client source systems and target the data warehouse.
- Developed Product Catalog and Reporting DataMart databases with supporting ETLs.
- Implemented ETL processes for warehouse and designed and implemented code for migrating data to Data lake using Spark.
- Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
- The data is collected from distributed sources into Avro models . Applied transformations and standardizations and loaded into Hive for further data processing.
- Built Platfora Hadoop multi-node cluster test labs using Hadoop Distros (CDH 4/5, Apache Hadoop, MapRand Horton Works) and Hadoop Eco-systems, Virtualizations and Amazon Web Services component.
- Installed, Upgraded and Maintained Cloudera Hadoop-based software.
- Experience with hardening Cloudera Clusters, Cloudera Navigator and Cloudera Search.
- Managing Running Jobs, Scheduling Hadoop Jobs, Configuring the Fair Scheduler, Impala Query Scheduling.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Extensively worked on Impala to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Extensively Used Impala to read, write and query the Hadoop data in HDFS. Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Used struts validation framework for form level validation
- Wrote test cases in Junit for unit testing of classes.
Confidential, Cincinnati, OH.
- Developed solutions to process data into HDFS , process within Hadoop and emit the summary results from Hadoop to downstream systems.
- Installed and configured Hadoop MapReduce , developed multiple MapReduce jobs for cleansing and preprocessing.
- Worked and written Hadoop MapReduce jobs to run on Amazon EMR clusters and creating workflows for running jobs.
- Worked on Sqoop extensively to ingest data from various source systems into HDFS.
- Imported data from different relational data sources like Oracle , MySQL to HDFS using Sqoop .
- Analyzed substantial data steps using Hive queries and Pig scripts.
- Written Pig scripts for sorting, joining, and grouping data.
- Integrated multiple sources of data (SQL Server, DB2, MySQL) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
- Played a major role in working with the team to leverage Sqoop for extracting data from Oracle.
- Solved small file problem using Sequence files processing in MapReduce.
- Implemented counters on HBase data to count total records on different tables.
- Created HBase tables to store variable data formats coming from different portfolios. Performed real time analytics on HBase using Java API and Rest API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Experienced with different scripting language like Python and shell scripts
- Oozie and were used to automate the flow of jobs and coordination in the cluster respectively.
- Worked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar files (RC).
- Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
- Experienced with working on Avro Data files using Avro Serialization system.
- Kerberos security was implemented to safeguard the cluster.
Environment: HDFS, Pig, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, Python, MySQL, Oracle, Shell Scripting, Kerberos, EMR, Oozie, Zookeeper, EMR, SQL Server, DB2, MySQL.
Sr. Java Developer
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Integrated Spring Dependency Injection among different layers of an application with springand O/R mapping tool of Hibernate for rapid development and ease of maintenance.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Implemented the Connectivity to the Database Server Using JDBC.
- Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
- Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
- Configured domains in production, development and testing environments using configuration wizard.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.
- Involved in analysis, design and development of e-bill payment system as well as account transfer system and developed specs that include Use Cases.
- Class Diagrams, Sequence Diagrams and Activity Diagrams.
- Involved in designing the user interfaces using JSPs.
- Developed custom tags, JSTL to support custom User Interfaces.
- Developed the application using Struts Framework using Model View Layer (MVC) architecture.
- Implemented persistence layer using Hibernate that use the POJOs to represent the persistence database tables. These POJOs are serialized Java Classes that would not have the business processes.
- Implemented Hibernate using the Spring Framework (Created the session Factory).
- Implemented the application using the concrete principles laid down by several design patterns such as MVC, Business Delegate, Data Access Object, Singleton and Factory.
- Deployed the applications on BEA Web LogicApplication Server.
- Developed JUnit test cases for all the developed modules.
- Used CVS for version control across common source code used by developers.
- Used Log4J to capture the log that includes runtime exceptions.
- Used JDBC to invoke Stored Procedures and database connectivity to ORACLE.
- Refactored the code to migrate from Hibernate2.x version to Hibernate3.x. (I.e. moved from xml mapping to annotations) and Implemented the Hibernate Filters and Hibernate validators.
- DAO and the hibernate transactions was implemented using spring framework.
Environment: Java, J2EE, JSP, JNDI, Oracle 10g, DHTML, ANT, Rationale Rose, Eclipse 3.1, Unix, Web logic Application Server, Hibernate 3.0, Struts, LOG4J, CVS.