- IT professional Around 6 years of experience with extensive knowledge and background in Software Development Lifecycle Analysis, Design, Development, Debugging and Deploying various software applications. More than 4 years of hands on experience in Big Data and Hadoop Ecosystem in ingestion, storage, querying, processing and analysis using HDFS, MapReduce, Pig, Hive, Spark, Flume, Kafka, Oozie etc. 2 years of work experience using JAVA/J2EE technologies.
- Good experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, HBase, Flume, Scoop, Spark, Strom, Kafka, Oozie and Zookeeper.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS) .
- Excellent Programming skills at a higher level of abstraction using Scala and Spark .
- Good understanding in processing of real-time data using Spark .
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop .
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive .
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie .
- Experience in managing and reviewing Hadoop Log files.
- Experience in setting up Zookeeper to provide coordination services to the cluster.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience and understanding in Spark and Storm .
- Hands Experience on dealing with log files to extract data and to copy into HDFS using flume .
- Experience in analyzing data using Hive, Pig Latin , and custom MR programs in Java .
- Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC) .
- Experience in multiple database and tools, SQL analytical functions, Oracle PL/SQL server and DB2 .
- Experience in Creating ETL/Talend jobs both design and code to process data to target databases.
- Experience in working with Amazon Web Services EC2 instance and S3 buckets .
- Worked on different file formats like Avro, Parquet, RC file format, JSON format .
- Involved in writing Python scripts for building disaster recovery process for current processing data into data center by providing current static location.
- Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
- Experience in ingesting data into Cassandra and consuming the ingested data from Cassandra to HDFS .
- Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS .
- Used Avro serialization technique to serialize data for handling schema evolution.
- Experience in designing and coding web applications using Core Java & Web Technologies - JSP, Servlets and JDBC , full Understanding of utilizing J2EE technology Stack, including Java related frameworks like Spring, ORM Frameworks (Hibernate) .
- Developed web application in open source java framework Spring . Utilized Spring MVC framework .
- Have good interpersonal, communicational skills, strong problem-solving skills, explore and adapt to new technologies with ease and a good team member.
Programming Languages: Java, C, Python, Shell Scripting
Big Data Technologies: HDFS, MapReduce, Hive, Pig, Hue, Impala, Sqoop, Apache Spark, Apache Kafka, Apache Ignite, Apache Nifi, OOZIE, FLUME, Zookeeper, YARN
No SQL Databases: MongoDB, HBase, Cassandra
Hadoop Distribution: Hortonworks, Cloudera, MapR
Databases: Oracle 10g, MySQL, MSSQL
IDE/Tools: Eclipse, NetBeans, Maven
Version control: GIT, SVN, CLEARCASE
Platforms: Windows, Unix, Linux
BI Tools: Tableau, MS Excel
Web/Server Application: Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server
Confidential, Valley Forge, PA
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO , PARQUET , CSV formats.
- Leveraged Hive queries to create ORC tables.
- Created Views from Hive Tables on top of data residing in Data Lake.
- Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka .
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
- Developed Spark code using Scala and Spark - SQL / Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context , Spark - SQL , Data Frame, Pair RDD's , Spark YARN .
- Experienced with batch processing of data sources using Apache Spark , Elastic search.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Import the data from different sources like HDFS / HBase into Spark RDD .
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala .
- Wrote complex SQL to pull data from the Teradata EDW and create Ad-Hoc reports for key business personnel within the organization.
- Used the version control system GIT to access the repositories and used in coordinating with CI tools.
- Integrated maven with GIT to manage and deploy project related tags.
- Experience with AWS S3 services creating buckets, configuring buckets with permissions, logging, versioning and tagging.
- Implementing a Continuous Integration and Continuous Deployment framework using Jenkins , and Maven in Linux environment.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
- Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, GIT, Jenkins, AWS (S3), Python, Java, SQL Scripting and Linux Shell Scripting, Hortonworks.
Confidential, Denver, CO
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
- Monitored multiple Hadoop clusters environments and monitored workload, job performance and capacity planning using Cloudera Manager.
- Load and transform large sets of structured, semi structured and unstructured data.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
- Developed Oozie workflow for scheduling Pig and Hive Scripts.
- Configured the Hadoop Ecosystem components like YARN, Hive, Pig, HBase and Impala.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, visit duration.
- Involved in setting QA environment by implementing pig and Sqoop scripts.
- Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Developed Pig Latin scripts to do operations of sorting, joining and filtering source data.
- Performed MapReduce programs on log data to transform into structured way to find Customer Name, age group, etc.
- Pro-actively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Executed test cases in automation tool, Performed System, Regression, Integration Testing, reviewed result, logged defect.
- Participated in functional reviews, test specifications and documentation review.
- Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.
Environment: Cloudera Hadoop Distribution, HDFS, Talend, Map Reduce (JAVA), Impala, Pig, Sqoop, Flume, Hive, Oozie, HBase, Shell Scripting, Agile Methodologies.
Confidential, Richardson, TX
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive and HBase database.
- Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
- Built a data flow pipeline using flume, Java (MapReduce) and Pig.
- Developed Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
- Extensive experience in writing Pig scripts to transform raw data into baseline data.
- Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
- Worked on Oozie workflow engine for job scheduling.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Worked on analyzing and writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Responsible for building scalable distributed data solutions using Hadoop.
- Leveraged Solr API to search user interaction data for relevant matches.
- Designed the Solr Schema, and used the Solr client API for storing, indexing, querying the schema fields
- Loading the data to HBASE by using bulk load and HBASE API.
- Validated applications Functionality, Usability and Compatibility during Functional, Exploratory, Regression, System Integration Content, UAT Testing phases.
Environment: Hortonworks Hadoop Distribution, MapReduce, HBase, Hive, Pig, Sqoop, Oozie, Flume, Solr, Shell script.
- Analyzed project requirements for this product and involved in designing using UML infrastructure.
- Interacting with the system analysts & business users for design & requirement clarification.
- Taken care of Java Multithreading part in back end components.
- Developed HTML reports for various modules as per the requirement.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Created multiple RESTful web services using jersey2 framework.
- Used Aqua Logic BPM (Business Process Managements) for workflow management.
- Developed the application using NOSQL on MongoDB for storing data to the server.
- Developed complete business tier with state full session Java beans and CMP Java entity beans with EJB 2.0.
- Developed integration services using SOA, Web Services, SOAP and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
- Responsible for managing the Sprint production test data with the help of tools like Telegance, CRM etc for tweaking the test data during the IAT / UAT Testing.
- Involved in writing Unit test cases using JUnit and involved in integration testing.