- Highly acumen and experienced IT professional with 8+ years of experience with 4+ Years as Hadoop Developer in Big Data/Hadoop technology development and 4 years as a Java developer.
- 4+ years of experience in Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie and Zookeeper.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- Good experience on creating Data Pipelines in Spark using SCALA.
- Good experience on Spark components like Spark SQL and Spark Streaming.
- Hands - on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
- Developed Spark Streaming applications for Real Time Processing.
- Strong experience on implementation of data processing on Spark -Core using Spark SQL and Spark streaming.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
- Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Hands on experience in working on Spark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Expertise in integrating the data from multiple data sources using Kafka.
- Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Worked extensively with Hadoop Distributions like Cloudera, Hortonworks. Good knowledge on MAPR distribution & Amazon's EMR.
- Written and implemented custom UDF's in Pig for data filtering.
- Expertise in writing Hive and PIG queries for data analysis to meet the business requirements.
- Hands-on experience in using Impala for data analysis.
- Hands-on experience in using the data ingestion tools - Sqoop and flume.
- Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
- Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Worked on NoSQL databases like HBase, Cassandra and MongoDB.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
- Hands on experience on build tools like Maven, Log4j, Junit and Ant.
- Experience in working with Spring framework from Java.
- Extensive experience with Databases such as Oracle, MySQL, MS-Sql and PL Sql Script.
- Experience in using IDEs like Eclipse and NetBeans.
- Working experience with Linux lineup like Red hat Enterprise Linux.
- Comprehensive knowledge of Software Development Life Cycle (SDLC).
- Exposure to Waterfall, Agile and Scrum models.
- Strengths include handling variety of software systems, capacity to learn and adapt to new technologies, amicable team player and curriculum focused with strong personal, technical and communication skills.
Big Data Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming Languages: C, C++, Java, SCALA
Scripting Languages: Shell Scripting, Java Scripting
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, SQL, PL/SQL, Teradata
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Build Tools: Ant, SVN, Maven, Git, GitHub
Development IDEs: NetBeans, Eclipse IDE
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Cloud: AWS, Azure
Packages: Microsoft Office, putty, MS Visual Studio
Sr. Hadoop Developer
Confidential, CHICAGO, IL
- Extracted the data from RDBMS into HDFS using Sqoop.
- Used Flume to collect,aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
- Implemented MapReduce programs on log data to transform into structured way to find user information.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
- Export the analyzed data to relational databases using Sqoop for visualizations and to generate reports for the BI team.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Worked on data serialization formats for converting complex objects into sequence bits by using Avro, RC and ORC file formats.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Development of complex Pig scripts to transform raw data from the staging area.
- Designed and developed Hive tables to store staging and historical data.
- Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
- Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated using Pig and then processed data was stored into Hive warehouse which enabled Business analysts to get the required data from Hive.
- Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and Data Frames API to load structured and semi-structured data into Spark clusters.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Used Apache Kafka for importing real time network log data into HDFS.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka, Linux.
Confidential, Dublin, OH
- Involved in loading data from UNIX file system to HDFS.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Worked on Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and troubleshooting, Racks, Disk Topology, Manage and review data backups, Manage and review Hadoop log files.
- My responsibility involves in setting up the Hadoop cluster for the project and working on the project using HQL and SQL.
- Used D3.js to visualize the json data generated from hive quires related to Tumors.
- Used Spark to perform Variant Calling Techniques In big data Genomics.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH5 Hadoop cluster on RHEL. Assisted with performance tuning, monitoring and troubleshooting.
- Created Map Reduce programs for some refined queries on big data.
- Involved in the development of Pig UDF'S to analyze by pre-processing the data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Involved in developing the complex Map/reduce Jobs for data cleaning.
- Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Involved in setting up of HBase to use HDFS.
- Extensively used Pig for data cleansing.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Used Spark Streaming to fetch the twitter data with ASU hast tags to perform the sentiment analysis.
- Used Hive partitioning and bucketing for performance optimization of the hive tables and created around 20000 partitions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD's.
- Used Spark with Scala.
- Created topics on the Desktop portal using Spark Streaming with Kafka and Zookeeper.
- Involved in getting back the lost data using DAG process.
- Used Datastax JAR's for this project.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: MapReduce, HDFS, Hive, Java, Pig, Linux, XML. HBase, Zookeeper, Kafka, Sqoop, Flume, Oozie.
Confidential, Fort Mill, SC
- Involved in designing and defining the different ways of data analytics concept using Hadoop.
- Involved in implementation of the ad-clicks (from social networking sites) based data analytics for a particular keywords.
- Crawled the public posts from Facebook and Twitter.
- Implemented the MapReduce programs, Hive/Impala queries to analyze sales pattern and customer satisfaction index as per the requirement defined by the Data Science team.
- Involved in sizing & scaling the infrastructure requirements by analyzing the daily data storing into the Hadoop system.
- Involved in sqooping the data from RDBMS to HDFS and vice versa by using the Sqoop tool.
- Worked in doing the performance analysis and optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency,reduce time etc.
- Well versed with the agile methodology.
- Storing the data to the HDFS via Flume from the various sources like social sites, mobile applications, and from the different websites.
- Created the automated workflows using oziee for the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to HDFS.
Environment: Hadoop, HBase, HDFS, MapReduce, Pig, Sqoop, Java, Cloudera Manager.
Confidential, Newark, NJ
- Implemented CDH3 Hadoop cluster on RHEL.
- Design and develop a daily process to do incremental import of raw data from Oracle into Hive tables using Sqoop.
- Launching and Setup of Hadoop Cluster which includes configuring different components of HADOOP.
- Hands on experience in loading data from UNIX file system to HDFS.
- Cluster coordination services through Zookeeper.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig's Load and Store functions.
Environment: Apache Hadoop, MapReduce, HDFS, RHEL, Zookeeper, Sqoop, Hive, Pig, Oozie, Java, Eclipse, JSPServlets, Oracle.
- Core Java coding and development using Multithreading and Design Patterns.
- Designed dynamic user interfaces using AJAX and JQuery to retrieve data without reloading the page and send asynchronous request.
- Developed Servlets and JSP based on MVC pattern using Struts framework.
- Created Action Classes, Form Beans, and Model Objects for the application using Model View Controller (MVC) approach.
- Involved in the integration of spring for implementing Dependency Injection.
- Created connections to database using Hibernate Session Factory, using Hibernate APIs to retrieve and store data to the database with Hibernate transaction control.
- Working on a banking application to create JSON webservice by calling a SOAP service.
- Hands on experience with Web Services including REST and SOAP.
- Optimized SQL queries used in batch processing.
- Extensively written unit test cases using JUnit framework.
- Used JIRA tool for tracking stories progress and follow agile methodology.
- Developed the application using NetBeans as the IDE and used its features for editing, debugging, compiling, formatting, build automation and SVN.
- Used Gradle tool for building and deploying the Web applications in Jboss.
- For Bulk Order Processing, Implemented Functionality to Read Input Data from MS-Excel Files using Java and JXL API.
Environment: Core Java, Multithreading, Jdk, JDBC, Servlets, JSP, Struts, Hibernate, Spring, Web Services, JSP, JQuery, JSON, AJAX, Html, CSS, Java Script, log4j, SQL Server, Junit, Gradle, Jboss Server, GIT, NetBeans, DOJO, UNIX, Waterfall.
- Understood the requirements from the business/functional perspective.
- Proposed and implemented design/architectural enhancements.
- Came up with solutions on application performance bottlenecks and implemented them.
- Performed regular Quality Review processes to assure the quality of the deliverables.
- Guided the team members on analysis and Solution approaches for requirements.
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle.
- Involved in Requirement Analysis, Development and Documentation.
- Used MVC architecture (Jakarta Struts framework) for Web tier.
- Participated in developing form-beans and action mappings required for struts implementation and validation.
- Development of front-end screens with JSP Using Eclipse.
- Involved in Development of Medical Records module. Responsible for development of the functionality using Struts and EJB components.
- Coding for DAO Objects using JDBC (using DAO pattern).
- XML and XSDs are used to define data formats.
- Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
- Involved in Bug fixing and functionality enhancements.
- Designed and developed excellent Logging Mechanism for each order process using Log4j.
- Involved in writing SQL Queries.
- Involved in Check-in and Checkout process using CVS.
- Created SAP Business Objects Reports.
- Developed additional functionality in the software as per business requirements.
- Involved in requirement analysis and complete development of clientside code.
- Followed Sun standard coding and documentation standards.
- Participated in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
- Developed software application modules using disciplined software development process.