- Overall 7 years of professional IT experience in Software Development. This also includes 5+ years of experience in Ingestion, Storage, Querying, Processing and Analysis of Big Data using Hadoop technologies.
- Excellent understanding / knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce &YARN.
- Hands on experience in using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm & Impala.
- Experience in Spark, Also worked in improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's and Datasets.
- Developed producers for Kafka which compress and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size.
- Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Expertise in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Developed simple to complex Map/Reduce jobs using Hive and Pig to handle files in multiple formats like JSON, Text, XML, Sequence File etc.
- Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of Map Reduce jobs.
- Experience in working with different data sources like Flat files, XML files, log files and Database.
- Very good understanding and working knowledge of Object Oriented Programming (OOPS).
- Expertise in application development using Scala, RDBMS, and UNIX shell scripting.
- Extensive experience with SQL, MySQL Queries and database concepts.
- Experience in developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
- Worked on ingesting log data into Hadoop using Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice-versa.
- Using Apache Flume, collected and stored streaming data (log data) in HDFS.
- Created producer, consumer and Zookeeper setup for oracle to Kafka replication.
- Experience in optimizing the queries by creating various clustered, non-clustered indexes and indexed views using and data modeling concepts.
- Experience with scripting languages (Scala, Pig and Shell) to manipulate data.
- Worked with relational database systems (RDBMS) such as My SQL, and No SQL database systems like HBase and had basic knowledge on MongoDB and Cassandra.
- Good Knowledge in Amazon AWS concepts like EMR & EC2 web services which provides fast & efficient processing of Big Data.
- Exposure to Java development projects.
- Knowledge in designing and developing Mobile Applications using Java Technologies like JDBC and IDE tools like Eclipse, Android Studio.
- Exhibited strong written and oral communication skills. Ability to adapt to evolving Technology, Strong Sense of Responsibility and .
Programming Languages: C, C++, Java, Scala, UNIX Shell Scripting
Hadoop/ Big Data Stack: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Unix, Scala, Kafka, Oozie, Zookeeper, Spark, Spark-SQL, Spark Streaming, Sqoop, Flume, HBase.
Hadoop Distributions: Cloudera, Horton Works, MapR
AWS Tools: AWS EC2, AWS S3, EMR, CloudWatch
Databases: MySQL, SQL Server
No SQL Databases: HBase
HiveQL, SQL, Spark: SQL
IDE s: Eclipse, NetBeans, IntelliJ, Android Studio
Operating Systems: Windows, Linux, Unix, CentOS
Web Technologies: HTML, CSS, XML, JDBC, Web Services
Java Technologies: Core Java, JDBC, JSP
Senior Hadoop Developer
Confidential, Phoenix, AZ
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Involved in file movements between HDFS and AWSS3 and extensively worked with S3 bucket in AWS.
- Developing use cases for processing real time streaming data using tools like Spark Streaming.
- Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations.
- Imported required tables from RDBMS to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Responsible for batch processing of data sources using Apache Spark.
- Developed predictive analytic using Apache Spark Scala APIs.
- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Implemented Cloudera Manager on existing cluster.
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
Environment: HDFS, MapReduce, JavaAPI, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Yarn, Eclipse, UNIX Shell Scripting, Cloudera.
Senior Hadoop Developer
Confidential, Mclean, VA
- Data Ingestion using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Apache tools like FLUME and SQOOP into HIVE and NoSQL databases like HBase.
- Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
- Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
- Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
- Implemented helper classes that access HBase directly from Java using Java API.
- Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
- Responsible for converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
- Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Participated with admin in installation and configuring Map Reduce, Hive and HDFS.
- Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring
- Managed and reviewed Hadoop log files.
- Involved in review of functional and non-functional requirements.
Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux, Scala.
Confidential, Auburn Hills, MI
- Involved in designing & developing Hadoop Map Reduce jobs Using JAVA Runtime Environment for the batch processing to search and match the scores.
- Involved in developing Hadoop Map Reduce jobs for merging and appending the repository data.
- Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie.
- Executed speedy reviews and first mover advantages by using workflows like Oozie in order to automate the data.
- Loading process into the Hadoop distributed File System (HDFS) and Pig language in order to preprocess the data.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, Sqoop, Flume).
- Worked on Oozie workflow engine for Job scheduling.
- Importing and exporting large sets of data into HDFS and vice-versa using Sqoop.
- Used Java for reading data from MySql database and transferring it to HDFS.
- Transferred log files from the log generating servers into HDFS.
- Read the log generated data form HDFS using advanced HiveQL (Serialization-De Serialization).
- Executed the HiveQL commands on CLI (Command Line Interface) and transferred back the required output data to HDFS.
- Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition.
Environment: Hadoop, Map Reduce, Hdfs, Hive, Sql, Pig, Zookeeper, MongoDb, Centos, Cloudera Manager, Sqoop, Oozie, Zookeeper, MySQL, HBase, Java.
Confidential, NYC, NY
- Data Ingestion implemented using SQOOP, SPARK, loading data from various RDBMS, CSV, XML files.
- Data cleansing, transformations tasks are handled using SPARK using SCALA and HIVE.
- Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Involved in converting Hive/SQL queries into Spark RDD using Scala.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Responsible for Job management using Fair scheduler and Developed Job Processing scripts using Oozie Workflow.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD's.
- Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during Ingestion process itself.
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked with No SQL databases like HBase. Creating HBase tables to load large sets of semi structured data coming from various sources.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Responsible to manage data coming from different sources.
- Responsible on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Environment: Scala, Hive, HBase, Flume, Java, Impala, Pig, Spark, Oozie, Oracle, Yarn, Junit, Unix, Cloudera, Flume, Sqoop, HDFS, Java, Python.
- Developed simple to complex Map Reduce jobs using Java language for processing and validating the data.
- Developed data pipeline using Sqoop, Map Reduce, and Hive to ingest, transform and analyze operational data.
- Developed Map Reduce jobs to summarize and transform the raw data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Used Impala to read, write and query the Hadoop data in Hive.
Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Flume, Impala, ETL, REST, Java, MySQL, Oracle 11g, Unix/Linux.
- Developed Java Script Behavior code for user interaction.
- Developed UI screens for data entry application in Java GUI.
- Implemented the project according to the Software Development Life Cycle (SDLC).
- Front end screens development-using JSP with tag libraries and HTML pages.
- Followed Coding Guidelines and update the status leads in time.
- Involved in Requirements Gathering, Analysis, Design, Development, Testing and Maintenance phases of Application.
- Used core java concepts like Collections, Generics, Exception handling, IO, Concurrency to develop business logic.
- JSON is used for serializing and de serializing data that is sent to or receives from JSP pages.
- Closely working with QA, Business and Architect to solve various Defects in quick and fast to meet the deadlines.
- Created Style Sheets (CSS) to control the look and feel of entire site.
- Developed client side screen using HTML.
- Used Eclipse as IDE.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Used HTML and CSS, as view components in MVC.