- Around 6 years of experience with Big Data using Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Sqoop, Kafka, Flume, HBase, Oozie, Impala, Tez). 5+ years of experience in software development.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Cloudera distributions. Experience in setting up Hive, Pig, HBase and Sqoop on Ubuntu Operating system.
- Hands on experience in writing MR jobs using Java, expert knowledge of MRv1 and MRv2.
- Hands on experience in writing PIG Latin scripts.
- Expertise in ingesting data from RDBMS and logs to HDFS to allow consistent data mining, oversight and governance activities.
- Configured Zoo Keeper to monitor Hadoop clusters and to feed notifications to Nagios.
- Configured Hive Server (HS2) to enable analytical tools like Tablaue, SAS, and Datameer to interact with Hive tables.
- Configured Mahout and customized using Taste - CF to perform analysis.
- Proficient in Java Web Services, Spring and Hibernate Technologies.
- Excellent hands-on experience of Application servers like JBoss, BEA WebLogic, Apache Tomcat, IBM Web Sphere.
- Strong analytical and conceptual skills in database design and development using SQL Server, Oracle.
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Kafka, Oozie, Flume, Zookeeper, Spark, Scala, Impala, Cloudera, Amazon EC2
Scripting Languages: Perl, Shell, R
Programming Languages: C, C++, Java
Application Server: WebLogic, Apache Tomcat.
NoSQL Databases/Reporting Tools: Hbase, Cassandra, Tableau, Microstrategy
Databases /ETL: Oracle 10g/11g, MySQL 5.2, DB2, SQL, PL/SQL Informatica Power center v 9.6, Teradata
Operating Systems: Linux, UNIX, Windows Server 2003
IDE's: Eclipse, Maven
Confidential, New York, NY
Big Data Developer
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
- Load the data into HBase tables for UI web application.
- Written customized HiveUDFs in Java where the functionality is too complex.
- Maintain System integrity of all sub-components related to Hadoop.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Supported Map Reduce Programs those are running on the cluster
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Real time streaming the data using Spark with Kafka.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Created web dashboard for traders to track Volatility Smile in real time using AngularJS, Bootstrap.
- Created a database API layer using Slang on top of a centralized data repository HDFS to store, extract, process and provide analytics for the Liquidity Risk team using Apache Spark.
- Used Oozie as an automation tool for running the jobs. Scheduled workflow using Oozie workflow Engine.
- Extensively used spring controllers, IOC, AOP and JDBC modules.
- Used Springs JDBC and DAO layers to read the data from database.
- Implemented REST APIs for collection and retrieval of data.
- Extensively used Java 8 to simplify the code.
- Developed the business domain layer using Java, J2EE, JDBC and used DAO, Singleton. iSQL tool is used to execute SQL queries as well as build and manage database objects.
Confidential, Houston, TX
- Installed and configured fully distributed Hadoop cluster.
- Performed Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting.
- Extensively used Cloudera Manager to manage the Hadoop cluster.
- Used Oozie to automate/schedule business workflows which invoke Sqoop, MapReduce and Pig jobs as per the requirements.
- Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer and transaction data by date.
- Worked with various HDFS file formats like Avro, SequenceFile and various compression formats like Snappy, bzip2.
- Developed efficient MapReduce programs for filtering out the unstructured data.
- Developed the Pig UDF's to pre-process the data for analysis.
- Developed Hive queries for data sampling and analysis to the analysts.
- Worked on Hbase and MySQL for optimizing the data.
- Designed the Data Model to be used for correlation in Hadoop/Hortonworks.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Designed and proposed end-to-end data pipeline using falcon and Oozie by doing POCs.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Developed custom Unix SHELL scripts to do pre and post validations of master and slave nodes, before and after configuring the name node and data nodes respectively.
- Involved in HDFS maintenance and administering it through Hadoop-Java API.
- Supported Map Reduce Programs those are running on the cluster.
- Developed Java Map Reduce programs using Mahout to apply on different datasets.
- Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.
- Configured Sentry to secure access to purchase information stored in Hadoop.
- Involved in several POCs for different LOBs to benchmark the performance of data-mining using Hadoop.
- Actively involved in software development life cycle starting from requirements gathering and performing Object Oriented Analysis.
- Involved in Agile Methodology with sprint cycle of 15 days.
- Worked on various design patterns specific to the requirement.
- Navigated and understood through the Workflow of the development.
- Optimized client-side performance of Java applications (Struts2, Spring MVC).
- Used Spring Core Annotations for Dependency Injection and used Apache Camel to integrate Spring framework.
- Implementation of microservices using OSGI and deploying into Karaf containers.
- Worked on form validation using the Spring Validator framework.
- Developed Form Beans and Action Classes to model views and client side functionality.
- Used the Struts Validator framework for validating the forms.
- Used different type of Spring controllers depending on the business requirement.
- Implementation of second level cache in Hibernate.
- Developed POJO's, Data Access Object (DAO) which handles all database operations using Hibernate.
- Designed a RESTful API with Spring 3.
- Implemented agent-server messaging dialog using Camel and JMS (Active MQ implementation)
- Worked on Camel-based integration middle-ware solution for Provisioning Services by design and Implementation of business logic and data processing routes using Apache Camel.
- Implemented HTTP REST API using Node.js and express.
- Application backend implemented as Node.js and express application server. server created with Node.js using redis for message routing.
- Wrote build & deployment scripts using Ant, Maven on UNIX environment.
- Worked with Quality Assurance team in tracking and fixing bugs.
- Involved in Performance Tuning of the database.
- Developed Scripts for customizing reports depending upon various customer requirements.
- Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.