Hadoop Developer Resume
Fremont, CA
SUMMARY
- Hadoop Developer with around 8 years of experience in Information Technology & Hadoop Ecosystem.
- Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase, Kafka, Samza for Data Analytics.
- Good Knowledge in Apache Spark and SparkSQL.
- Have a hands - on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka. Experience on streaming data using Apache Flume.
- Worked on Key-Value pair with the help of RDD’s transformation and action for sorting, filtering and analyzing Big-Data (pyspark)
- Experience in designing and developing tables in HBase and storing aggregated data from Hive Table. Good Knowledge with NoSQL Databases - Cassandra, Mongo DB and HBase.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3 and Redshift.
- Knowledge on Scala Programming Language for developing Spark applications.
- Worked on all kinds of file format such as AVRO, Sequence, Parquet, text-file for both importing and exporting from HDFS.
- Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
- Hands on experience in cleansing semi-structured and unstructured data using Pig Latin scripts
- Experience in working with BI Visualization tools like Tableau, Qlikview and informatica.
- Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis
- Experience in handling multiple relational databases: MySQL, SQL Server, PostgreSQL and Oracle
- Extensive experience in working with Struts and Spring MVC (Model View Controller) architecture for developing applications using various Java/J2EE technologies like Servlets, JSP, JDBC, JSTL.
- Hands-on experience in developing web applications using MVC (Model View Controller) architecture including Spring MVC, Struts, Hibernate and Servlets.
TECHNICAL SKILLS
Big Data Technologies: Hadoop Architecture, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Zookeeper, Flume, Kafka, Samza Apache Spark, Spark Streaming, Spark SQL, Spark MLib
Databases: MySQL, SQL Server, PL/SQL, Cassandra, TeraData
Environment: Cloudera, Hortonworks, MapR
BI Tools: Tableau, Informatica
PROFESSIONAL EXPERIENCE
Confidential, Fremont, CA
Hadoop Developer
Responsibilities:
- Developed efficient Map Reduce programs in java for filtering out the unstructured data.
- Imported data from various relational data stores to HDFS using Sqoop
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
- Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
- Installed and configured Hive to create tables for the unstructured data in HDFS
- Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
- Involved in loading data from UNIX file system to HDFS
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Responsible for importing and exporting data into HDFS and Hive using Sqoop
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Experienced in managing Hadoop log files
- Worked on managing data coming from different sources
- Wrote HQL queries to create tables and loaded data from HDFS to make it structured
- Load and transform large sets of structured, semi structured and unstructured data
- Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
- Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
- Wrote and modified store procedures enabling to load and modify data as per the project requirements
- Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
- Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
- Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
- Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
- Supported Map Reduce Programs running on the cluster
- Created external tables in Hive and loaded the data into these tables
- Hands on experience in database performance tuning and data modeling
- Monitored the cluster coordination using ZooKeeper
Environment: Hadoop v1.2.1, HDFS, MapReduce, Hive, Sqoop, Pig, DB2, Oracle, XML, CDH4.x
Confidential, Phoenix, AZ
Hadoop Developer
Responsibilities:
- Importing Large Data Sets from DB2 to Hive Table using Sqoop.
- Created Hive Managed and External Tables as per the requirements
- Designing and developing tables in HBase and storing aggregating data from Hive
- Developing Hive Scripts for data aggregating and processing as per the Use Case.
- Writing Java Custom UDF's for processing data in Hive.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data
- Optimized Hive queries for performance tuning.
- Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
- Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
- Installed Oozie workflow to run several MapReduce jobs.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
- Continuously monitored and managed Hadoop cluster using Cloudera Manager.
- Performed POC’s using latest technologies like spark, Kafka, Scala
- Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
Environment: Hadoop v2.4.0, HDFS, Map Reduce, Core Java, Oozie, Hive, Sqoop, CDH 4.x.x
Confidential, Tampa, Florida
Senior Hadoop Developer
Responsibilities:
- Being a ground up project, we have developed the entire application from scratch and I have worked mainly on writing code for Kafka Producer and Kafka Consumer as per our requirement.
- After persisting the data into Kafka brokers successfully, It is written to a flat file from where we load it into HIVE table.
- Defined and created the structure of Hive table on one side and Hbase table on the other side.
- Developed a spark pipeline to transfer data from lake to Cassandra in cloud to make the data available for decision engine to publish customized offers real time.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Performed complex mathematical, statistical and machine learning analysis using SparkMlib, Spark Streaming and GraphX. Worked on Amazon Web Services EC2 console.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Environment: Hadoop v2.6.0, HDFS, CDH 5.3.x, Map Reduce, HBase, Sqoop, Core Java, Hive, Oozie DB, Spark Streaming and Apache Kafka
Confidential, Dublin, Ohio
Java Developer
Responsibilities:
- Involved in development, testing and maintenance process of the application
- Used Spring MVC framework to implement the MVC architecture.
- Developed Stored Procedures, Triggers and Functions in Oracle.
- Developed spring services, DAO's and performed object relation mappings using Hibernate.
- Involved in understanding the business processes and defining the requirements.
- Build test cases and performed unit testing.
- Logging done using Log4j.
- Used CVS for version control.
Environment: Java 7 version, IntelliJ, Maven, Spring Framework, JavaScript, Oracle SQL Developer
Confidential, Cleveland, Ohio
Java Developer
Responsibilities:
- Participated in the implementation of efforts like coding, unit testing.
- Implemented a web based application using Servlet, JSP.
- Developed Customs tags to display dynamic contents and to avoid large amounts of java code in JSP pages.
- Developed code for handling the exceptions using exceptional handing.
- Wrote PL/SQL queries, stored procedures, and triggers to perform back-end database operation
- Prepared test case document and performed unit testing and system testing.
- Followed the algorithms given by senior database programmers and developing tables and database queries.
Environment: Java J2EE, Java Spring, Hibernate, Maven, Jenkins, Excel,, Eclipse IDE, Windows