- Around 8 years of IT professional Experience along with Hadoop, experience in developing, configuring, implementing Hadoop and Big - data ecosystems on various platforms.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, Spark and Flume Big Data and Big Data Analytics.
- Experience in analyzing data using HiveQL (HQL), Pig Latin.
- Got experience in managing and reviewing Hadoop Log files.
- Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
- Experience in working with versions of Hadoop 1.0 and Hadoop 2.0 (YARN).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
- Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
- Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
- Excellent knowledge on Spark Core architecture.
- Hands on expertise in writing different RDD transformations, actions using Scala.
- Created Data Frames and performed analysis using Spark SQL.
- Have good knowledge in creating the Autosys jobs and Beatle jobs.
Big Data Tools: HDFS, MapReduce, YARN, Hive (HQL), Pig, Sqoop, Flume, Oozie, Kafka, spark, Horton work.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH).
Programming Languages: SQL, MySQL, Oracle, Scala.
Database: MySQL, NoSQL, HBase, Oracle.
Operating Systems: UNIX, Linux, Windows Variants.
Tools: Eclipse, IntelliJ, SBT, SQL Server Management Studio, GitHub.
Confidential, Charlotte, NC
- Involved in spark migration from Spark 1.x to Spark 2.x.
- Involved in implementing Spark Dataframes using Java API from Different sources of systems
- Involved in writing AutoSys Jobs like Box and Jil files creation.
- Used HDFS File system API for moving files between LFS and HDFS.
- Perform transformations/partitioning on incoming data and created external Hive tables to store the processed results and storing data in Parquet format.
- Data validation, schema validation & validating data between staging tables & final tables in HULC.
- Manage and monitor Hadoop, Spark, HIVE development issues and bugs.
Confidential, Charlotte, NC
- Create schema document for hive tables, and Create Hive tables in Datalake bronze layer with and without partitions
- Create hql files to load data from staging to permanent table
- Load the hive tables from source files through file ingestion process
- Load the hive tables from external database server tables through Sqoop ingestion process
- Create config files for data loading process
- Create and schedule Autosys jobs for load process
- Unit testing of loading process
- Attending daily standup calls and provide updates on current task status
- Attending Sprint Planning meetings and provide estimates for upcoming tasks.
- Supporting the application in Production environment and providing fixes to the issues.
- Developing enhancements to the existing application.
Confidential, Boston, MA
- Helped business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Created Hive queries (HQL) that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked with Spark and Scala.
- Responsible for end to end design on Spark Sql, Development to meet the requirements.
- Experienced in working with Spark eco-system using SparkSQL and Scala queries on different formats like Text file, CSV file.
- Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Scala.
- Performed data frame and dataset operations on rdd.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed Hive queries for the analysts.
Tools: Used: Hadoop, HDFS, Hive(HQL), Sqoop, Oozie, Spark, Spark SQL, Cloudera, PL/SQL, SQL*PLUS, Windows NT, UNIX Shell Scripting.
- Responsible for building scalable distributed data solutions using Hadoop.
- Checking of Hadoop daemon services and responding accordingly to any warning or failure conditions.
- Deployed Hadoop Cluster in the different modes- Standalone, Pseudo-distributed, Fully Distributed
- Writing shell scripts to monitor the health check of Hadoop daemon services and responding accordingly to any warning or failure conditions.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Executed queries using Hive and developed Map-Reduce jobs to analyze data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Implemented best practices to create Hive tables with appropriate partition methods and processing of data be consistent with Enterprise standards Developed Scripts and Batch Job to schedule various Hadoop Programs.
- Develop Hive queries for the analysts.
- Writing Hive queries for data analysis to meet the business requirements.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Developed Map-Reduce, Hive, Pig Scripts to process data.
- Imported data from RDBMS to HDFS using Sqoop.
- Involved in Hive and HBase Integration by using HBase Storage Handler.
- Developed custom Event flume sink responsible for collecting the data in real time and storing it in cache for Analysis.
- Analyzed the data using Hadoop ecosystem like Hive, Flume.
Tools: Used: Hadoop, Cloudera, Hive, HBase, SQL,HQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, MapR, Java, MySQL, Horton work.
- Developing personal projects with guidance of database veterans as mentors
- Creating databases
- Good knowledge and Experience in dealing with Relational Database Management Systems, including Normalization, Stored Procedures, Constraints, Querying, Joins, Keys, Indexes, Complex Views, Dynamic SQL, Triggers and Cursors.
- Expertise with DDL and DML statements, RDBMS, data dictionaries and normal forms.
- Writing stored procedures, using temporary tables, views, indexes, triggers when required and complex queries including correlated queries and queries with complex joins and aggregate functions.
- Experience in using Try Catch Block introduced in SQL Server 2005.
- Experience in writing complex T-SQL queries using Inner Joins, Outer Joins and Cross Joins.
- SQL server administration skills including, backups, disaster recovery, database maintenance, user authorizations, database creation
- Experienced with cross browser compatibility and worked on various browsers like Google chrome, Mozilla Firefox, Internet Explorer and Safari.
Tools: Used: SQL Server 2000/2005, SSIS, Microsoft Office, Windows 2003/XP.