- Having 7+ years of programming experience as a software developer, which includes hands on experience over 5+ years in Big Data Technologies and experience of 2+ years in Java, J2EE Technologies
- Hands on experience in working with Hadoop framework stack including HDFS, MapReduce, YARN, Hive, Pig, HCatalog, HBase, Kafka, Sqoop, Flume, Zookeeper and Oozie.
- In depth understanding of Hadoop architecture and various components such as Resource Manager, Node Manager, Application Master, Scheduler, Application Manager, Name Node, Data Node, HDFS and MapReduce programming paradigm
- Implemented UDFs for Hive and PIG as per the business needs and strong understanding in Pig and Hive analytical functions.
- Worked on Spark Core, Spark Streaming to handle real - time data from Kafka.
- End to End Implementation in importing and exporting data using Sqoop from Relational Database to HDFS and from HDFS to Relational Database.
- Performed administrative tasks such as installation and maintenance Hadoop framework and its components in test environment such as Sqoop, Flume, HBase, PIG and Hive
- Extensively working experience with NoSQL databases like HBase.
- Implemented Unit test cases and Integration test cases using Mockito and MRUnit
- Experience in working with Apache Spark with Kafka to persist data to HBase
- Experience in managing and reviewing Hadoop Log files
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
- Experience in deploying applications in heterogeneous application servers Tomcat, Web Sphere and Jboss
- Experienced in working on Version Control tools like SVN and GitHub
- Experienced in working with Different tools like JIRA, Maven, Service Now, Log4j
- Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with individuals at all levels and can work as a part of a team as well as independently.
- A quick learner organized and highly motivated as well as a keen interest in the emerging technologies.
Languages: Java, Pig Latin - scripting, Hive/HQL and Linux shell scripts
Hadoop / Big Data: HDFS, MapReduce, YARN, HBase, Pig, Hive, HCatalog, Sqoop, Flume, Oozie and Spark
Java: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE’s: Eclipse, JDeveloper Studio, My Eclipse
Operating Systems: Linux, Fedora, CentOS, Ubuntu Windows XP, 7and MS DOS
J2EE Technologies: Spring, JSF, JMS, JNDI, Servlet 2.0, Hibernate
NoSQL Databases: HBase
Confidential, Phoenix, AZ
- Working on a live 800+ nodes Hadoop cluster running on Cloudera Enterprise
- Extensively worked with highly semi-structured and structured data of 700+ TB
- Moved data to Big Data Platform, which is single source of truth for raw transformed data for analytics, reporting and decision making.
- Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
- Developed custom writable MapReduce JAVA programs to load web server and application logs into Hive using Flume.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HiveQL queries.
- Optimized platform storage by using Parquet format for storing Hive tables.
- Architected merging of multiple market feeds to single feed to save on storage and eliminating same schema for multiple markets
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Used Kafka and Spark Streaming to process real time data from source systems and stored in HBase to access it in real time.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Integrated Tableau with Hive and generated the reports which are consumed by the business analysts.
- Designed and coded application components in an agile environment utilizing test driven development approach.
Environment: Hadoop, Pig, Hive, MapReduce, Sqoop, Flume, Linux, Cloudera, Spark, Kafka, Tableau, Oozie, Zookeeper, Service now, Rally, Shell Script.
Confidential, Charlotte, NC
- Loaded customer data from various source systems such as Oracle, Mysql DB to HDFS using Sqoop.
- Implemented 26 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Simple to complex MapReduce jobs, and loaded data into HBase.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Integrated Tableau with hive using ODBC drivers and connectors.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
- Setup Hadoop cluster using EC2 (Elastic MapReduce) on managed Hadoop Frame Work.
- Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
- Used S3 Bucket to store the jar’s, input datasets and used DynamoDB to store the processed output from the input data sets.
- Wrote hive queries to export, import and query data in DynamoDB.
- Ran hive queries in interactive and batch modes using AWS CLI and Console.
Environment: CDH4, Cloudera Manager, MapReduce, HDFS, Hive, Pig, HBase, Flume, Oracle, Mysql, Sqoop, Oozie, AWS, Tableau.