- Experience around 8 years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects.
- Experience around 5 years in development, implementation and configuration of Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Oozie, Sqoop, NiFi, Kafka, Zookeeper, ElasticSearch, Knox, Ranger, Cassandra, HBase, MongoDB, Spark Core, Spark Streaming, Spark Data Frame and Spark MLlib.
- Experienced in configuration, deployments and managing of different Hadoop distributions like Cloudera (CDH4 & CDH5) and Hortonworks (HDP).
- Experience of import/export data using Sqoop from Hadoop distributed file systems to relational database systems and vice versa.
- Experience in handling various file formats like AVRO, Sequential, text, xml, JSON and Parquet with different compression techniques such as gzip, LZO, Snappy etc.
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- Imported the data from source HDFS into Spark Data Frame for in - memory data computation to generate the optimized output response and better visualizations.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frame.
- Experienced on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark.
- Extending HIVE core functionality by using custom User Defined Function's (UDF) and User Defined Aggregating Functions (UDAF).
- Implemented POC for using Impala for data processing on top of HIVE for better utilization of C++ executions engines.
- Experience in NoSQL Databases HBase, Cassandra and it's integrated with Hadoop cluster.
- Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase limitations.
- Exploring with Spark Beta version API to improve the performance, and optimization of the existing algorithms with different modes such as YARN, Mesos and standalone for POC.
- Expertise in using ETL Tool Informatica Power Center designer, workflow manager, repository manager, data quality and ETL concepts.
- Experienced with NiFi to automate the data movement between different Hadoop systems.
- Worked with different Hadoop Security such as Knox and Ranger integrated LDAP store with Kerberos KDC.
- Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and SQL Server.
- Experienced in scheduling and monitoring the production jobs using Oozie and Azkaban.
Hadoop Ecosystems: Hadoop, HDFS, MapReduce, Hive, Spark Core, Spark SQL, Spark Streaming, Spark MLlib Impala, Kafka, YARN, Oozie, Zookeeper, Solar, Sqoop, NiFi, Knox, Ranger, and Kerberos.
Cloud Services: EMR, EC2, S3, Cloud Watch, RedShift, BigQuery and MS Azure.
Languages: Java, Scala, Python, Pandas, R, PL/SQL, UNIX Shell Scripting.
Development Tools: IntelliJ, Postman's, Scala IDE, Jupyter, Zeppelin, Condo.
Frameworks/Web Server: Spring, JSP, Hibernate, Web Logic, Web Sphere, Tomcat SQL/ NoSQL Databases Teradata, PostgreSQL, Oracle, HBase, MongoDB, Cassandra, MySQL and DB2.
Other tools: GitHub, BitBucket, SVN, JIRA, Vagrant, Dockers, Maven.
Confidential, Pleasanton, CA
Hadoop Big data Developer
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloud era distribution.
- Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Worked on Installing and configuring the HDP Horton works 2.x and Cloud era (CDH 5.5.1) Clusters in Dev and Production Environments
- Worked on Capacity planning for the Production Cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in Configuring Kerberos Authentication in the cluster, cluster upgradation in Hadoop and fixing cluster issues.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Sparkcontext, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Worked on taking Snapshot backups for HBase tables.
- Involved in Cluster Monitoring backup, restore and troubleshooting activities.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily Hbase Indexer Service.
Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, CDH5, Java, SQL scripting, Linux shell scripting, Eclipse and Cloud era and Unix/Linux, Hue (Beeswax).Confidential, Oak brook, IL
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
- Familiarity with a NoSQL database such as MongoDb.
- Used Impala for data analysis.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Extensively worked on the core and Spark SQL modules of Spark.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Created HBase tables to store various data formats of data coming from different sources.
- Responsible for importing log files from various sources into HDFS using Flume.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Designed Nifi to pull data from various sources and push it in HDFS and Cassandra.
- Worked extensively with the No SQL databases like MongoDB and Cassandra.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Worked with Nifi for managing the flow of data from source to HDFS
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Apache Kafka for tracking data ingestion to Hadoop cluster.
- Integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MongoDb, MapReduce, Java, Hive, Pig, Sqoop, Flume, SparkOozie, Hue, Kafka, SQL, ETL, Cloudera Manager, MySQL.Confidential, Atlanta, GA
- Transferring and exporting data from Oracle into HDFS and Hive using Sqoop.
- Developing HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Automatically Importing data in regular basis using sqoop into the Hive partition by using apache Oozie
- Experiencing in managing and reviewing Hadoop log files Load and transform large sets of data.
- Conducting data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Impala.
- Involving in creating Hive tables, loading with data and writing hive queries.
- Developed Sqoop Jobs to both import data into HDFS from Relational Database Management System like Oracle & DB2 and export data from HDFS to Oracle.
- Developed Pig functions to preprocess the data for analysis.
- Created Oozie workflows to sqoop the data from source to HDFS and then to target tables.
- Created HBase tables to store all data.
- Analyzed identified defects and its root cause and recommended course of actions.
- Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop Cluster.
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
- Worked on streaming the analyzed data to the existing relational databases using Sqoop for making it available for visualization and report generation by the BI team.
Environment: Hadoop, HDFS, Hive, MapReduce, Sqoop, Java, Pig, SQL Server, Shell Scripting.Confidential, Houston TX
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Used UDF's to implement business logic in Hadoop Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Actively involved in all phases of SDLC process/methodology and ensured project delivery.
- Created reusable components to enhance process efficiency and minimize impact to application.
- Diligently Track progress during course of the project, prepare daily reports, status reports and relevant documents that are needed in various phases of the project and communicate it to the leadership
- Recommended best practices, enhancements to existing process, implementing technological improvements and efficiencies
- Analyzed current programs including performance, diagnosis and troubleshooting the problem.
- Prepared technical specification documentation
- Extensively used the LOG4j to log regular Debug and Exception statements.
Environment: Hadoop, HDFS, MapReduce, Hive, PIG, Java, HBASE, Sqoop, Flume, MySQL.Confidential, Austin, TX
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Pro-actively monitored systems and services and implementation of Hadoop Deployment, configuration management, backup and procedures.
- Responsible for Cluster maintenance, adding and removing cluster nodes (Commissioning and Decommissioning), Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Experienced in Recovery of Node failure.
- Performed Hadoop Upgrade activities.
- Managing and scheduling Jobs on a Hadoop cluster.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Worked with support teams to resolve performance issues.
- Involved in testing, implementation and documentation
- Working with data delivery teams to setup new Hadoop users.
- Created User defined types to store specialized data structures in Cloudera.
Environment: Horton works, Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Java, Eclipse.Confidential
- Involved in code reviews and mentored the team in resolving issues.
- Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
- Developed struts action classes, action forms and performed action mapping using Struts Framework and performed data validation in form beans and action classes.
- Involved in multi-tiered J2EE design utilizing MVC architecture (Struts Framework) and Hibernate.
- Extensively used Struts Framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Involved in system design and development in core java using Collections, multithreading.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes to the record and save the updated information back to the database.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed applications with ANT based build scripts.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
Environment: Oracle 11g, Java 1.5, Struts 1.2, Servlets, HTML, XML, MS MS SQL Server 2005, J2EE, JUnit,Tomcat 6.Confidential
- Collecting and understanding the User requirements and Functional specifications.
- Creating components for isolated business logic.
- Deployment of application in J2EE Architecture.
- Implemented Session Facade Pattern using Session and Entity Beans
- Developed message driven beans to listen to JMS.
- Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
- Used WebLogic to deploy applications on local and development environments of the application.
- Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
- Developed DAO (Data Access Objects) using Spring Framework 3.
- Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
- Provided on call support based on the priority of the issues.