- Hadoop Developer with 4+ years of IT experience and enthusiast to work in an environment which hone my skills and knowledge.
- Around 3.5 years in Big Data Technologies and great hands - on experience in HDFS, Map Reduce, Hive, Sqoop, Oozie, Pig, Spark, Kafka, Flume, Scala.
- Great experience in Spark framework by creating RDD’s, DataFrames, Datasetsand mapping with key/value pairs using Scala, Python.
- Experience in creating Cloud based applications in AWS.
- Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
- Delivering report on occupancy using Excel Pivot Table, VLOOKUP & Slicer
- Experienced in Big data solutions and Hadoop ecosystem related technologies. Well versed with Big Data solution planning, designing, development and POC's .
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes , applying AWS EMR to run a Spark job as required in the environment.
- More than one year of hands on experience using Spark framework with Scala, Python . Good exposure to performance tuning hive queries and map-reduce jobs in spark framework .
- Hands on experience in Python, Java, Scala and Hadoop while working in production environment and some academic projects
- Great knowledge on Apache Spot which provides cybersecurity response for discovering analytics for detecting advanced cyber threats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka .
- Hands on experience with the AWS CLI and SDKs/API tool s.
- Great Knowledge on Simple Storage Service(S3), Amazon Simple DB, Amazon Cloud Watch, SNS, SQS, LAMBDA .
- Hands on experience in NOSQL databases like HBase, MongoDB and Cassandra .
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP .
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables .
- A strong sense of professional responsibility.
- Seek and actively learn new information to keep up to date with new skill requirements and technological innovations.
- Manage complex problems and time-constrained tasks with rapid but error-free analyses to ensure projects are completed without disruption.
- Good at human relations, enthusiastic to learn and to take up challenging work.
- Comprehensive problem solving abilities, team player and hard working.
Big Data: HDFS, Map Reduce, Hive, Pig, Kafka, Spark, Scala, Sqoop, Flume &Oozie
Databases Technologies: NoSQL(MongoDB, Cassandra), OracleCloud, AWS, Cloudera CDH, Hortonworks Sandbox
Tools: Eclipse, MATLAB, Scala IDE, SQLDeveloper, MS Office
Database: Oracle 11g, MS-SQL Server
Operating Systems: Linux, Unix, Windows
Confidential, Jacksonville, FL
- Responsible for building multiple domain SellPoint application to provide data to DATAMART solutions using Hadoop Ecosystem.
- Actively participated in the development of Sell Point used by Confidential Agents and Sales Reps.
- Completed quote for either Small Group or Large Group, Sell Point does extensive integration with other systems to validate the addresses, generate rates, send emails
- Integration methodologies which are used here are Spark for distributed processing and kafka for ingestion framework and NoSql databases.
- Build Streaming application for extracting avro events based on schema registry and processed based on buisness needs.
- Automated renewal process that successfully implemented to renew Group Insurance Plans to almost 90%, which increased the effectiveness of the whole process.
- Built DATAMART as one repo for multiple domains
- Worked on Hortonworks-HDP 2.5 distribution.
- Involved in review of functional and non-functional requirements .
- Responsible for designing and implementing along with other developer for data pipeline using Big Data tools including Hive, Spark, Scala for building streaming applications.
- Experience in using Apache Storm, Spark Streaming, Apache Spark, Apache NiFi, Kafka and Flume in creating data streaming solutions.
- Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
- Involved in importing data from Microsoft SQL Server, MySQL, and Teradata into HDFS for implementing an incremental load using Sqoop.
- Good knowledge in using Apache NIFI to automate the data movement for transfer files into Hadoop and other environments.
- Extensively used Stream Sets Data Collector to create ETL pipeline for pulling the data from RDBMS system to HDFS.
- Implemented the data processing framework using Scala and Spark SQL .
- Worked on implementing the performance optimization methods to improve the data processing timing.
- Experienced in creating the shell scripts and made jobs automated.
Environment: Hadoop, MapReduce, HDFS, Hive, Kafka, Spark, Scala AKKA, REST API, NIFI, Cassandra, HDP2.5.
Confidential, Durham, NC
- Analyzed datasets using Pig, Hive, MapReduce, and Sqoop to recommend business improvements
- Setup, installed, and monitored 3-node enterprise Hadoop cluster on Ubuntu Linux
- Analyzed and interpreted transaction behaviors and clickstream data with Hadoop and HDP to predict what customers might buy in the future
- Experienced in converting the validated RDDs into Data frames for further processing.
- Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
- Experienced in fine tuning the jobs for better performance in the production cluster space.
- Worked totally in agile methodologies, used Rally scrum tool to track the User stories and Team performance.
- Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
- Experienced working with hive database through beeline.
- Worked on analyzing and resolving the production job failures in several scenarios.
Environment: Hadoop, MapReduce, HDFS, Hive, Kafka, Spark, Scala, NIFI, MongoDB, HDP.
Confidential, Newark, DE
- Responsible for building scalable streaming data solutions using Kafka and Flume.
- Installed and configured Hive, Flume on the Hadoop cluster.
- Developed Simple to complex Flume Interceptor validation Jobs in Flume.
- Optimized Flume capacity for HDFS efficiently by using various compression mechanisms.
- Managed OFFSHORE team and taking initiatives with and delegating the work to junior developers which they can solve and reviewing their code all the time and help them to solve the bugs.
- Involved in Hadoop cluster task like commissioning & decommissioning Nodes without any effect to running jobs and data.
- Optimized the streaming datapipeline for reconciliation of the events received from KAFKA .
- Wrote Map Reduce jobs to discover trends in data usage by users.
- Involved in running Kafka streaming jobs to process terabytes of events data.
- Analyzed large data sets by running Hive queries.
- Good experience on Real-time streaming data using Flume and Kafka source and channel
- Worked hands on with ETL process and Involved in the development of the Hive/Impala scripts for extraction, transformation.
- Monitored flume agents using Splunk.
- Helped the team to increase the Kafka Broker size from 22 to 30 Nodes.
- Worked extensively with JDBC driver for importing metadata from Oracle.
- Involved in creating Hive tables, and loading and analyzing data using hive queries for staging and permanent tables.
- Designed, developed and did maintenance of data integration programs in a Kafka environment with both traditional and non-traditional source systems data stores for data access and analysis.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Written Hive Queries for optimizing the deduping of records from KAFKA
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Gained experience in managing and reviewing Hadoop log files.
Environment: Hadoop, MapReduce, HDFS, Hive, Kafka, Flume, Cassandra, Splunk, Java, Oracle 10g, MySQL, Cloudera.
- Importing the data from the MySql and Oracle into the HDFS using Sqoop .
- Importing the unstructured data into the HDFS using Kafka .
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Great working experience on Real-time streaming data using Spark and Kafka connect.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala scripts for extraction, transformation and loading of data into other data warehouses.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce .
- Configured connection between HDFS and Tableau using Impala for Tableau developer team.
- Importing and exporting data into HDFS using Sqoop and Kafka .
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop .
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala .
- Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Created data-models for data using the Cassandra Query Language
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Used DatastaxOpsCenter for monitoring Cassandra nodes like Monitoring Compaction Reads/Writes.
- Loaded data into Cassandra Tables . Extensively used sstable loader utility to load Big Tables
- Used DatastaxDevCenter query tool for interacting the data in Cassandra .
- Cross examining data loaded in Hive table with the source data in oracle.
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system. .
Environment: Hadoop, MapReduce, Cassandra, Hive, Kafka, Flume, HBase, Java, IBM.
Confidential, ForMill, SC
Jr. Java/Hadoop Developer
- Developed Simple MapReduce Jobs that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in extracting customer's Big data from various data sources into Hadoop HDFS.
- Imported the data from relational databases into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop.
- Organized workflows using Oozie.
- Created partitioned tables in Hive. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created managed or external tables in Hive as per business requirement.
- DevelopedPIG scripts to analyze data.
- Writing the script files for processing data and loading to HDFS .
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data. .
- Data Modelled for HBase for large transaction sales data.
- Load and transform large sets of structured and semi structured data.
- Created HBase tables to store huge volumes of data in rows and columns of variable data formats of input data coming from different portfolios.
- Analyzed HBase data in Hive (version 0.11.0.2) by creating external partitioned and bucketed tables so that efficiency is maintained.
Environment: Java, HDFS, MapReduce, Hive, Pig,Sqoop, Oozie, Cloudera, Oracle, Hbase.