- 9+ years of experience with skills in analysis, design, development, debugging and deploying various software applications including 5+ years of experience in Hadoop Eco system and Big - Data Analytics.
- IBM Certification In Big Data & Hadoop Development.
- Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop, Map Reduce, HDFS, HBase, Oozie, Sqoop, Flume, Pig & Hive.
- Experience in analyzing data using Pig Latin, HQL, HBase and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Hands on experience on configuring aHadoopcluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance and S3 configurations.
- Skilled in developing Mapreduce programs using Hadoop Java API and also using Hives and pig to perform data analysis, data cleansing and data transformation.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Experience in importing and exporting data using Sqoop from HDFS to Relational DB systems.
- Configuring Kerberos and integrating with Directory services.
- Experience in working with different data sources like Flat files, XML files, log files and Databases.
- Excellent understanding of Object Oriented design methodology and Core Java Concepts such as multi-threading, exception handling, generics, annotations, collections and I/O.
- Good understanding of NoSQL Data bases and hands on work experience in writing application on
- Experience in writing Shell scripting.
- Worked with different Python libraries like Pandas, Numpy, boto3, web etc.
- Experience in collecting business requirements, writing functional requirements and test cases and creating technical design documents with UML - Use Cases, Class, and Sequence and Collaboration diagrams.
- Excellent communication skills, interpersonal skills, problem solving skills, and a very good team player along with can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, HBase, Avro.Spark
Data Bases: Oracle 7/8/8i/9i/10g/11g/12c, MySQL 5.6.16, 5.6.20 SQL Server 2000/2005/2008/2012/2014, PostgreSQL
NOSQL Databases: HBase, MongoDB, Cassandra
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)
Programming Languages: Python, Java, SQL, PL/SQL, Scala, Unix shell scripts
Tools Used: Eclipse, Putty, Superputty, MS Office, Crystal Reports, Microsoft Visio
Confidential, Durham, NC
- Working on cloudera hadoop clusters.
- Set up keytab for scheduled applications Kerberos authentication on hadoop clusters.
- Working on POC to migrate existing applications on cloudera hadoop cluster to EMR.
- Extracted the data from SQL Developer into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Designed and developed data pipeline to ingest data from different sources to HDFS.
- Working extensively on impala for preparing input data for application and ingesting application output into impala for analytics.
- worked on create/drop temp impala table on fly to load data into parquet table
- Used data scripts in python and created csv files to append tables in impala.
- Developed RDD's using Python and coded Python applications for business requirements.
- Worked on submitting spark jobs using spark-submit command.
- Worked on creating Spark RDD's and Data Frames applying operations like Transformation and Actions and converting RDD's to Data Frames.
- Imported data from AWS S3 into spark RDD, Performed transformations and actions on RDD's.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Developed and enhanced python code generate genotyping report using different data inputs.
- Migrating data from traditional database to hadoop cluster using custom python script.
- Developed Unix Shell scripts and python scripts for scheduling and automating the job flow.
- Providing support for the production applications.
- Participated in all the stages of software development lifecycle including design, development, implementation, and testing.
Environment: CDH5.9.1, Hadoop 2.6.0, YARN, HDFS, Spark 1.6.0, Sqoop 1.99.5, Hive 1.1.1, Pig 0.14.0, Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, Java 1.8, Python 2.7.5, PyCharm, Putty
Confidential, Wilmington, Delaware
- Worked on different size of clusters on Cloudera and Hotonworks distribution.
- Created Hive and Tez views and providing permission to user and AD groups on Ambari server.
- Created Hive internal/external tables with proper static and dynamic partitions and working on them using HQL.
- Written Hive queries for data analysis to meet the business requirement.
- Performance tuning using Partitioning, bucketing of HIVE tables.
- Worked on creating UDF for Hive and Impala.
- Worked on creating the RDD's, Data Frame's for the required input data and performed the data transformations usingSparkScala.
- Import the data from different sources like HDFS/Hbase intoSparkRDD.
- Developed spark scripts by using Scala shell as per requirements.
- Used spark cluster to manipulate RDDS (Resilient Distributed Datasets) and also used concepts of RDD partitions.
- Loading data into spark RDD and do in memory data Computation to generate the Output response.
- Assisted in upgrading, configuration and maintenance of various Hadoop Ecosystem components like Pig, Hive, and Hbase.
- Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
- Involved in HDFS maintenance and loading the data using Sqoop and responsible to manage data coming from different sources.
- Experience in working onSparkSQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS
- Experience managing and reviewing the Hadoop log files.
- Used PIG Latin scripts by defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Flume is used moving large amounts of log data from many different sources to a centralized data store.
- Developed the sqoop scripts in order to make the interaction between HDFS and MySQL Database.
- Providing Hue access as per user request.
- Responsible for maintain voltage security at cluster level and voltage server level by understanding encryption and decryption process.
- Worked on creating backup script for voltage severs using bash scripting and set up cron tab jobs.
- Providing L2 level support for different Application. Managing and reviewing Log files for troubleshooting purpose, meeting the SLA's on time
- Sending and Receiving Handovers to and from Offshore by following Global Delivery Model.
Environment: CDH5.5, Hadoop 2.7.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0 HDP 2.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Jenkins 1.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6, Superputty, Scala IDE, voltage 6.3
Confidential, Atlanta, GA
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Worked on the SparkSQL for analyzing the data
- Used Scala to write code for all Spark use cases.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’S and Scala.
- Worked in SparkSQL on different data formats like JSON and Parquet.
- DevelopedSparkscripts by using Scala shell commands as per the requirement.
- UsedSparkAPI over ClouderaHadoopYARN to perform analytics on data in HDFS.
- Load the data intoSparkRDD and performed in-memory data computation to generate the output response.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
Environment: CDH5.5, Hadoop 2.6.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6
Confidential, Atlanta, GA
- Experienced in loading and transforming large sets of structured, semi-structured and unstructured data.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP.
- Extensive experience in writing HDFS & Pig Latin commands.
- Develop UDF's to provide custom hive and pig capabilities and apply business logic on that data.
- Created Hive internal/external tables with proper static and dynamic partitions.
- Using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Experience in NoSQL database such as Hbase.
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently with time and data availability.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Performed File system management and monitoring on Hadoop log files.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
- Created an e-mail notification service upon completion of job or the particular team which requested for the data
Environment: Hadoop 2.3.0, HDFS, Map Reduce, CDH5, HIVE 0.12.0, PIG 0.12.0, Hbase 0.98.1, Sqoop 1.4.3, Flume 1.4.0, Oozie 4.1.0, Zookeeper 3.4.5, MySQL 5.6.16, Java 1.6
Confidential, Norcross, GA
- Actively involved in all the phases of Software Development Life Cycle (SDLC) of the application Requirement gathering, Design Analysis and Code development
- Extensively used core java concepts such as OOP and exceptional handling.
- Developed Records using Java, HTML, CSS, JSP and Servlets and MySQL.
- Worked on Eclipse IDE to write the code and integrate the application
- Developed Information System J2EE and MySQL.
- Responsible for developing the Struts-configuration file, Action classes for handling the Http requests from the front-end components, OOAD concepts applied.
- Used Hibernate as the object relational mapping tool for persisting java objects.
- Developed the front-end for faculty home pages using Dreamweaver.
- Worked on documenting the project and analyzing the requirements of the project.
- Tested the application for various inputs.
- Involved in code reviews and cross checking whether coding standards are being followed.