- Accomplished IT professional with 6+ years’ experience and currently working as a Hadoop Developer.
- Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Spark SQL, Hive and Pig
- Hands - on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Hive, Oozie, Apache and other Hadoop related eco-systems as a Data Storage and Retrieval systems.
- Worked on Oozie to manage and schedule the jobs on Hadoop cluster
- Involved in developing multiple jobs using Oozie Workflow Engine which runs Hadoop Map-Red Jobs.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map-Reduce, Hive, Spark jobs.
- Experience in installation, configuration, support and monitoring of Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experience in developing ETL (Extraction, Transformation and Loading) procedures and Data Conversion Scripts using Pre-Stage, Stage, Pre-Target and Target tables.
- Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Oozie.
- Monitored Full/Incremental/Daily Loads and support all scheduled ETL jobs for batch processing.
- Expert in developing Stored Procedures, views, effective DDL/DML Triggers and views on index to facilitate efficient data manipulation and data consistency.
- Handling structured and unstructured data and applying ETL processes. Strong experience in Data Warehousing and ETL using Spark,SSIS.
- A diligent professional with outstanding analytical, communication and negotiation skills.
Confidential, Bethesda, MD
- Collected huge amounts of data from multiple sources and stored in HDFS.
- Used Spark SQL to load JSON data and create schema RDD and loaded it into HIVE tables and handled structured data using spark SQL.
- Elaborated spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs
- Prepared the Oozie workflows with Sqoop actions to migrate the data from relational databases like RDBMS to HDFS.
- Expand programs in Spark based on the application for faster data processing than standard Map Reduce programs.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL
- Used Sqoop to store the data into Hbase and Hive
- Enumerated Hive queries to do analysis of the data and to generate the end reports to be used by business users
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Good experience with NOSQL databases like MongoDB.
- Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities
- Elaborated Spark code and Spark-SQL / Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed a data pipeline using Kafka, Hbase and Hive to ingest, transform and analyzing customer behavioral data
Environment: Hadoop, HDFS, CDH, Pig, Hive, Oozie, Zookeeper, Hbase, Spark, Spark SQL, NoSQL, Scala, Kafka,Tableau,Ab Initio, MongoDB
- Built scalable distributed data solutions using Hadoop ecosystem.
- Convert existing SQL queries/stored procedures to HIVE
- Installed and configured Hive on the Hadoop cluster.
- Developed Hadoop clusters to produce daily and monthly reports as per client's needs.
- Scheduled and managed jobs on Hadoop cluster using Oozie work flow.
- Used Hive Queries for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV, etc.
- Imported data using Sqoop from MySQL to HDFS on regular basis.
- Used HIVE to do transformations event joins and some pre-aggregations before storing the data onto HDFS
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behavior.
- Used Ambari for UI based Oozie scheduling and creating tables in Hive.
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
Environment: HDFS, Sqoop, Hive, SQL, Flume, Oozie, JSON, Avro, Ab Initio,Zookeeper, Cloudera.
Big Data Analyst (Research Assistant)
- Actively Participated in gathering User Requirements and System Specification.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Installed and configured Hive and written Hive UDFs.
- Involved in loading data from UNIX file system to HDFS.
- Data Analysis and Data Profiling per the Requirements.
- Created Oozie workflow engine to run multiple Spark jobs.
- Diagnosed and addressed operational data issues
- Migrated Various Access Databases to HIVE Server.
- Designed and Implemented Backup Plan.
SQL BI/ ETL Developer
- Wrote complex stored procedures, functions and triggers to implement business logics in the OLTP database.
- Extracted Data from various locations such as Excel and flat files to SQL server 2008 SSIS.
- Created database objects like Views, and Stored Procedures.
- Database development experience with Microsoft SQL Server in OLTP/OLAP environments using integration services (SSIS) for ETL (Extraction, Transformation and Loading).
- Developed ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc.) and loaded the data into target tables by performing various kinds of transformations using SQL Server Integration Services (SSIS).
- Created partitions on tables and implemented indexes and indexed views for enhancing the query performance.
- Created multiple database objects such as stored procedures, UDFs and temporary table as per business logic.
- Monitored and resolved deadlocks in SQL Server databases with locks, isolation levels and SQL Profiler.
- Developed SSIS packages for migrating data between OLTP servers and the data mart.
- Scheduled SSIS package executions utilizing SQL Server job agent.
- Developed reports using complex T-SQL queries, user defined functions, stored procedures and views.
- Involved in creating sub reports on dashboard, linked reports, charts, snapshot reports, drill through, and report models drill down reports using cubes. Data migration using DTS services across different databases like Oracle, MS access & flat files.
- Expertise in SQL Server Reporting Services (SSRS), Created Tabular, Matrix, Drill Down, Bar Chart, Pie Chart reports to analyze critical data in SSRS.
- Provide 24 X 7 support to all Production, QA & Development MS SQL Server Environments.
Environment: SQL Server 2008 R2, Erwin, T-SQL, SQL Profiler, MS Visual Studio, MS Excel, TFS, SSMS, BIDS, SSIS, SSAS, SSRS