- 4 years of overall IT experience and 4 years of comprehensive experience in Hadoop ecosystem tools and related technologies.
- Excellent understanding of Hadoop architecture and its components and MapReduce programming paradigm.
- Experience in analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java.
- Working experience on designing and implementing complete end - to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie and has a good knowledge on Hbase and zookeeper.
- Worked on a proof of concept to migrate Mapreduce and Pig code to Spark
- Experience in working with different data sources like csv files, xml files, Json files, Oracle/Mysql to load data into Hive tables.
- Experience in implementing User Defined Functions for Pig and Hive.
- Involved in writing shell scripts for Unix OS for application deployments to production region.
- Involved in maintaining hadoop cluster in development and test environment
- Good knowledge in mining the data in hadoop file system for business insights using Hive, Pig
- Expertise in Relational Database design, data extraction, data transformation from data sources using MySql and Oracle
- Good working knowledge on Eclipse IDE for both developing and debugging java applications.
- Involved in code Performance improvement and Query tuning activities.
- Working experience on ETL tool Pentaho Data Integration
- Responsible for designing and developing reports using Pentaho User Console.
- Solved production issues and production database issues.
- Interacted well with clients for clarifying issues & requirements.
- Leadership skills include ability to lead and motivate co-workers from all backgrounds, creative problem solving and in-depth proficiency with new technology trends.
Hadoop/Big Data Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Spark, Kafka
NoSQL Databases: Apache HBase, Cassandra
Databases: Oracle, MySQL, Java Development
Programming Languages: Java, C++, C
Logging API: Log4j
Reporting Tool: Pentaho, Talend
Tools: JUnit,Putty, WinSCP, FileZilla
Operating Systems: Windows, UNIX, Linux and Ubuntu
Version Control Systems: SVN
Build Tools: Maven, Jenkins
Business Analytics Platform Hadoop Developer
Project Skills: Hadoop, MapReduce, HIVE, Pig, Oozie, Sqoop, Hue, Spark, Cloudera, Java, JSON, XML, MySql, UNIX Shell Scripting, Log4j, Maven, Jenkins, SVN
- Extensively made use of Cloudera CDH 4 and CDH 5 distributions
- Designed & developed several use cases
- Created a Pig Scripts to load nested json data to HDFS
- Created several User Defined Functions in Pig and Hive
- Created Oozie workflows to streamline the data flow
- Creating shell scripts to load the raw data to HDFS
- Created pig scripts and map reduce programs to filter the log files and aggregate the data
- Loading log files data to Hive
- Using Sqoop to move data between HDFS and MySql
- Developed a spark code to migrate existing mapreduce code and pig scripts as part of proof of concept
- Unit testing the application
- Involved in making very important and major enhancements to the already existing Map Reduce programs and Pig scripts
- Used SVN i.e. Subversion for version control and maintained different versions based on the release
- Technical documentation of every design and development detail of each use case
- Administered and Maintained the hadoop clusters in Development and Test environments
- Provided support and maintenance
- Provided business insights on purchase patterns during promo periods by mining data in Hive
- Worked on a proof of concept to extract petrol prices with latitude and longitude values received from the mobile GPS.
Project Skills: Hadoop, MapReduce, HDFS, Hive, Sqoop, MapR, Java (jdk1.6), XML, Oracle 11g/ 10g, UNIX Shell Scripting, Pentaho Data Integration, Pentaho User Console, Pentaho Schema Workbench
- Contribution to Design of Application and Database Architecture.
- Documenting design document and conducting various POC’s to validate the design
- Developing a map reduce program to validate the raw data before loading it to database for analysis for the required columns and data format.
- Loading data from Linux file system to HDFS using Pentaho Data Integration
- Creating data flows in Pentaho Data Integration for aggregating the data and loading the data to Hive tables
- Moving the data to Oracle tables from Hive using Sqoop for reporting
- Creating Bar graphs, Heat maps and Geo Maps from aggregated data using Pentaho User Console.
- Conducting Sessions for the clients, testing team on using the application.
- Completed Unit Testing and Integration Testing.
- Documenting the user manual and troubleshooting guides
Project Skills: Hadoop, Hive, Sqoop, Oracle, Java, Shell script, Pentaho Data Integration
- Worked on MapR distribution.
- Migrated the needed data from Oracle into HDFS using Sqoop and importing various formats of flat files into HDFS
- Used Shell script to pull data from different structured folders into one place folder to process by PIG.
- Creating Workflows in Pentaho Data Integration
- Loading the data into Hive and using HQL to derive required metrics based on the requirement given by the client
- Moving the data from aggregated Hive tables to Oracle tables
- Producing an excel report from the resultant oracle table
- Created user manual and troubleshooting guides
- Involved in User
Systems Engineer, Intern
Project Skills: Cassandra, Java, JSF, HTML
- Loading data to Cassandra and Making updates
- Using Java API’s to communicate with Cassandra
- Creating a simple web application using Java and JSF
- Serializing and De-Serializing Objects to load the data as Byte Array to Cassandra
- Documenting the Design and Development details