Hadoop Developer Resume
Hutchinson, KS
PROFESSIONAL SUMMARY:
- Qualified IT Professional with around 6 years of experience as a Hadoop Consultant
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Cloudera, Zookeeper, Flume and Cassandra.
- Experience in installation, configuration, supporting and managing - CloudEra's Hadoop platform along with CDH3&4 clusters.
- Familiar and good exposure with Apache Spark ecosystem such as Spark , Spark Streaming using Scala and Python.
- Experience in analyzing data using Hive SQL , Pig Latin and custom MapReduce programs in Java and Python.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experienced in deployment of Hadoop Cluster using Puppet tool.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Nodes and MapReduce concepts.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good experience in implementing and setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
- Experience in Object Oriented language like Java and Core Java.
- Experience in creating web-based applications using JSP and Servlets.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
TECHNICAL SKILLS:
Hadoop Technologies: HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Spark, Strom, Taland, Ganglia
Operating System: Windows, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script
Project Management / Tools: MS Project, MS Office, TFS, HP Quality Center Tool
Front End: HTML, JSTL, DHTML, JavaScript, CSS, XML, XSL, XSLT
Databases: MySQL, Oracle 11g/10g/9i, SQL Server
NoSQL Databases: HBase, Cassandra
File System: HDFS
Reporting Tools: Jasper Reports, Tableau
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat, Web Logic
WORK EXPERIENCE:
Confidential, Hutchinson, KS
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Wrote Map Reduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Passionate about working on the most cutting-edge Big Data technologies.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Involved in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing PigUDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: : Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, Tableau.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Installed and configured Cloudera Hadoop on a 100 node cluster.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.
- Developed data pipeline using Sqoop, Hive, Pig and Java MapReduce to ingest claim and policy histories into HDFS for analysis.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Responsible for architecting Hadoop clusters with CDH3.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on NoSQL databases including HBase and ElasticSearch.
- Performed cluster co-ordination through Zookeeper.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Installed and configured Hive and also written Hive UDFs.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Developed shell script to pull the data from third party system’s into Hadoop file system.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Environment: Hadoop, MapReduce, HDFS, Flume, Cassandra, Sqoop, Pig, HBase, Hive, ZooKeeper, Cloudera, Oozie, ElasticSearch, Sqoop, NoSQL, UNIX/LINUX.
Confidential, Oklahoma City, OK
Hadoop Developer/Admin
Responsibilities:
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project. Interacted with the Business users to build the sample report layouts.
- Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Installed and Configured Cloudera Hadoop CDH4via Cloudera Manager in a pseudo distributed mode and cluster mode as a proof of concept.
- Created Map Reduce Jobs using Hive/Pig Queries.
- Extensively used Pig for data cleansing.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in configuring Sqoop to map SQL types to appropriate Java classes.
- Load and transform large sets of structured, semi structured and unstructured data.
- Cluster co-ordination services through ZooKeeper.
Environment: Hadoop, Oracle, Cloudera Hadoop CDH4, HiveQL, PigLatin, MapReduce, HDFS, HBase, ZooKeeper, Oozie, Oracle, PL/SQL, SQL*PLUS, Windows, UNIX, Shell Scripting.