- Over 8 years of experience in software development with experience in phases of Hadoop and HDFS development.
- Having 3+ years of experience in Hadoop framework and its Ecosystems.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Hands on experience in writing MapReduce jobs in Java, Pig, C++ and Python.
- Experienced on major Hadoop ecosystem’s projects such as PIG, HIVE and HBASE.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Have a good understanding to ETL concepts.
- Hands on experience in installing, configuring and using ecosystem components like HadoopMapReduce, HDFS, Hbase, ZooKeeper, Oozie, Flume, Kafka, Sqoop, Spark, Storm, Pig & Hive.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Hands on experience with Spark-Scala programming.
- Hands on experience in writing spark streaming application.
- Experience in database development using SQL and PL/SQL and experience working on databases like MySQL and SQL Server.
- Performed data analysis using MySQL & Oracle.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Wrote complex queries using SQL and Hive.
- Understanding of Data Structures and Algorithms.
- UNIX shell scripting, resource Extensive experience in Java and J2EE technologies like JSP, JDBC.
- Extensively worked on debugging using Eclipse debugger.
- Good understanding of Data Mining and Machine Learning techniques.
- Extensive experience in working with different databases such as RDBMS, SQL Server, MySQL and writing Stored Procedures, Functions, Joins and Triggers for different Data Models.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Strong problem solving skills, good communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
- A pleasing personality with the ability to build great rapport with clients and customers.
- Illustrates excellent verbal and written communication, paired with great presentation and interpersonal skills.
- Portrays strong leadership qualities, backed with a great track record as a team player.
- Adept with the latest business/technological trends.
- Possesses sharp business acumen as well as great analytical skills, with a penchant for improvisation and perfection.
- Exhibits a pleasing personality infused with bold confidence.
Hadoop Technologies: HDFS,MapReduce,Hive,Impala,Pig,Sqoop,Flume,Oozie,Zookeeper,Ambari,Hue,Spark,Strom,Talend,Ganglia
Operating System: Windows, Unix, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script
Project Management / Tools: MS Project, MS Office, TFS, HP Quality Center Tool
Databases: MySQL, Oracle 11g/10g/9i, SQL Server
NoSQL Databases: HBase, Cassandra
File System: HDFS
Reporting Tools: Jasper Reports, Tableau
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat, Web Logic
Confidential, Oak Brook, IL
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Wrote Map Reduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Involved in loading data from LINUX file system to HDFS.
- Developed hive queries and UDFS to analyze/transform the data in HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Implemented Avro and parquet data formats for A pache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Responsible for performing extensive data validation using Hive .
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Setting up monitoring tools Ganglia, Nagios for Hadoop monitoring and alerting. Monitoring cluster HBase/zookeeper using these tools Ganglia and Nagios.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining.
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Spark, Scala, Java, C++, Linux, Maven, Python, Teradata, Zookeeper, Ganglia, Tableau.
Confidential, Seattle, WA
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Hadoop is a big data platform that is used to store and analyze the data.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data .
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Experienced in managing and reviewing the Hadoop log files.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks .
- Injected files into HDFS layer to perform matching and the aggregations using Hive QL.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Created wrapper script using python that can be run to trigger java scripts and Hive QL.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
Environment: Hadoop, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, AWS, Python, LINUX, Java, C++, Eclipse, Cassandra.
Confidential - Sunnyvale, CA
Java / Hadoop Engineer
- Installed and configured Hadoop clusters for application development and Hadoop tools like Hive, Pig, Sqoop, HBase, Flume and Zookeeper.
- Worked on developing ETL processes to load data into HDFS using Sqoop and export the results back to RDBMS.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Involved in creating hive tables, loading the data and write hive queries that will run internally in a map reduce way.
- Analyzed business requirements and cross-verified with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades of Cloudera Hadoop distribution(CDH3 and CDH4) as required
- Moved data from Third party system to Hadoop File System (HDFS) vice versa using shell commands hosted on an AWS cluster.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Involved in POC to import TIFF, Text and JSON files from Rabbit MQ server to HBase using Spark Streaming.
- Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning.
- Involved in implementing High Availability & automatic failover infrastructure to overcome single point of failure for Name node.
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production systems
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Java 7, Eclipse, Hadoop, Pig, Hive, MapReduce, HDFS, Sqoop, Flume, Unix Shell Scripting, Oozie, AWS, Rabbit MQ, CDH, Linux.
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Worked with RESTful service for providing the services in JSON.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
Environment: s: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, RESTful, SOA,Tomcat 6.
- Involved in the analysis, design, implementation, and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL queries and Stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.