- Deep expertise in Analysis, Design, Development and Testing phases of Enterprise Data Warehousing solutions.
- More than 6 years of experience in IT Industry with 3+ Years of Experience in Hadoop technologies such as in Hadoop, Pig, Hive, HBase, Oozie, Zookeeper and Sqoop with hands on experience on writing Map Reduce/YARN jobs.
- Big data development experience with Google cloud.
- Experience in working with various Hadoop distributions - Cloudera and HortonWorks.
- Experience in migrating the data using Sqoop from Hadoop to Relational Database System and vice-versa.
- Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- Experience in installation, configuration, supporting and managing - CloudEra's Hadoop platform along with CDH3&4 &5 clusters.
- Familiarity on real time streaming data with Spark and Kafka.
- Experience in ETL analytics on ingested data using scripts built with Hive, Pig, Spark, MapReduce that include interactive, batch and real time processing.
- Have Experience of using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Good Experience in writing complex SQL queries with databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server 2005/2008.
- Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
- Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning.
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.
Big Data: Hadoop, Map Reduce, Pig, Hive, Hbase, Sqoop, Oozie, Cassandra, MongoDB, Horton Works, Kafka, Spark and Zookeeper, Big Query
Databases: DB2, MySQL,MS Access, MS SQL server,Teradata, NoSQL, Vertica, Aster nCluster, SSAS, Oracle, Oracle Essbase.
Operating Systems: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP/Vista
Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans
Version Control: Git, SVN, Perforce
IDE’S: Intellij, Eclipse, NetBeans, JDeveloper
Confidential, Chicago, IL
- Cloudera Hadoop installation & configuration of multiple nodes using Cloudera Manager and CDH 4.X/5.X.
- Prepared low level Design document and estimated efforts for the project.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Installed and configured Hadoop Map Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks and within AWS.
- Experience with NoSQL data modeling with Cassandra/Hbase/MongoDB etc.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Developed and implemented jobs in MR2 Horton Works Cluster and Cluster within AWS.
- Developed the PIG code for loading, filtering and storing the data.
- Developed Hive Scripts (HQL) for automating the joins for different sources.
- Developed various Big Data workflows using Oozie.
- Big data development with cloud experience Google cloud preferred.
- Development of MapReduce programs and data migration from existing data source using Sqoop.
- Developed the custom writable Python programs to load the data into the HBase.
- Developed Map Reduce Programs using MRv1 and MRv2 (YARN).
- Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
- Strong application DBA skills with Data modeling skills for NoSQL and relation databases.
- Built Java client that is responsible for receiving XML file using REST call and publishing it to Kafka.
- Built Kafka + Spark streaming job that is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
- Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs.
- Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
- Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X
Environment: Hadoop (HDFS), HBase, Map Reduce, Hive, Spark, Kafka, Oozie, flume, Spark, Cassandra, Horton works, UNIX Shell Scripting, MongoDB, MySQL, Eclipse, Toad, and HP Vertica 6.X/7.X.
Confidential, Dallas, TX
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Worked on debugging, performance tuning and Analyzing data using Hadoop components Hive & Pig.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed and implemented jobs in MR2 Horton Works Cluster.
- Developed and executed Hive, Spark and PIG Queries for de-normalizing the data.
- Created Hive tables from JSON data using data serialization framework like AVRO.
- Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Worked on loading data from LINUX file system to HDFS.
- Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Moved data from Hadoop to Cassandra using Bulk output format class.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Responsible for processing unstructured data using Pig and Hive.
- Adding nodes into the clusters & decommission nodes for maintenance.
- Created PIG script jobs in maintaining minimal query optimization.
- Worked on various Business Object Reporting functionalities such as Slice and Dice, Master/detail, User Response function and different Formulas.
- Strong experience on Apache server configuration.
Environment: Hadoop, HDFS, HBase, Pig, Hive, Spark, HortonWorks, Oozie, MapReduce, Sqoop, Cloudera, MongoDB, Cassandra, Kafka, LINUX, Java APIs, Java collection, Windows.
Confidential, Seattle, WA
- Supported Map Reduce Programs those are running on the cluster.
- Involved in using Pig Latin to analyze the large scale data.
- Involved in loading data from UNIX file system to HDFS.
- Interacted with business users on regular basis to consolidate and analyze the requirements and presented them with design results.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
- Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Worked on improving the performance by using various performance tuning strategies.
- Managed the evaluation of ETL and OLAP tools and recommended the most suitable solutions depending on business needs.
- Migrated jobs from development to test and production environments.
- Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Used Shell Scripts for loading, unloading, validating and records auditing purposes.
- Used Teradata Aster bulk load feature to bulk load flat files to Aster.
- Shell Scripts are also used for file validating, records auditing purposes.
- Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster database.
- Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
Environment: Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.
- Involved in Requirements analysis, design, and development and testing.
- Involved in developing of Group portal and Member portal applications.
- Developed front end using Struts and JSP.
- Developed webpages using HTML, Java script, JQuery and CSS.
- Developed customized reports and Unit Testing using JUNIT.
- Used Java 1.6, spring, Hibernate, Oracle, to build the product suite.
- Responsible for building projects in deployable files (WAR files and JAR files).
- Coded Java Servlets to control and maintain the session state and handle user requests.
- Involved in development, and Testing, phases of the project by following agile methodology.
- Implemented the logging mechanism using log4j framework.
- Developed Web Services.
- Verified software errors and interacted with developers to resolve the technical issues.
- Used Maven to build the J2EE application.
- Wrote complex SQL queries and stored procedures.
- Involved in maintenance of different applications.
- Involved in the designing of the project using UML.
- Followed J2EE Specifications in the project.
- Designed the user interface pages in JSP.
- Used XML and XSL for mapping the fields in database.
- Created stored procedures and triggers that are required for project.
- Created functions and views in Oracle.
- Enhanced the performance of the whole application using the stored procedures and prepared statements.
- Responsible for updating database tables and designing SQL queries using PL/SQL.
- Created bean classes for communicating with database.
- Involved in documentation of the module and project.
- Prepared test cases and test scenarios as per business requirements.
- Involved in bug fixing.
- Prepared coded applications for unit testing using JUnit.