We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • 7+ years of IT Experience in systems development, databases & analytics, with 2+ years of strong experience as Hadoop Developer.
  • Expertise in Big data, Hadoop, NoSQL and various components such as HDFS, MR2, YARN, Spark, PIG, Hive, Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, Kafka, Hue, CDH5, & HDP 2.x. Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive, & Pig.
  • Working experience on Cloudera, Horton Works Hadoop distribution.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Experience in writing UDFs in Hive and Pig.
  • Experience in Hive Partitioning and Bucketing.
  • Experience in importing and exporting data using Sqoop to HDFS from Relational Database Systems.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Good exposure with NoSQL Data bases: HBase.
  • Experience in AWS - S3, EC2, Redshift.
  • Experience in different Hadoop distributions like Cloudera, Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
  • Used HiveQL to do analysis on the data and identify different correlations
  • Strong experience in J2EE, JSP, Servlets, Struts, Spring, Hibernate, JDBC, SOAP, WSDL, JSON, JQuery, Java Script, CSS and HTML, Java Multithreading, Exception Handling.
  • Developed applications using Spring Framework and implemented spring modules like core container module, application context module, Aspect oriented module (AOP Module), JDBC Module, ORM Module and web module
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
  • Developed MapReduce programs in Python with the Hadoop streaming API.
  • Good Knowledge of Data Profiling using Informatica Data Explorer.
  • Extensive experience in ETL Design and Development.
  • Good Project Management Knowledge Areas and Process groups.
  • Experience working in an iterative, agile software lifecycle (SDLC) with strong ability to estimate/scope the development of projects.
  • Well versed in OLTP Data Modeling and Strong knowledge of Entity-Relationship concepts.
  • Experience in Data Cleaning and Data Preprocessing usingPython Scripting.
  • Strong in RDBMS Databases, PL/SQL programming.
  • Strong experience in Oracle SQL queries, PL/SQL Stored Procedures, Functions, Packages, Triggers and Cursors with Query optimizations as part of ETL Development process.
  • Knowledge on Handling Hive queries using Spark SQL that integrate Spark environment.
  • Hands on experience of UNIX and shell scripting to automate scripts.
  • Good experience on FileZilla and WinScp tools for transferring files to UNIX environments.

TECHNICAL SKILLS

Big Data Technologies: Hadoop 2.7.x/2.5.x/2.4.x/1.x.y, HDFS, MapReduce, Sqoop 1.4.x, Oozie, Pig 0.15/0.14/0.11 , Hive 1.2.1/0.14/0.13/0.10 , ZooKeeper, Impala, Hue, Flume, HBase, Spark.

Programming: Python 3.x/2.x, Java 1.7/1.6, C and PL/SQL

ETL/BI Tools: Informatica Power Center 9.x/8.6, OBIEE

Script/Markup: JavaScript, XML, HTML, JSON and Unix Scripting

IDE: Eclipse, Rational Web Application Developer, NetBeans

App/Web Servers: Apache Tomcat Server, Apache / IBM HTTP Server, WebSphere Application Server 6.1/7.0

Messaging & Web Services: SOAP, REST, WSDL, UDDI, JMS and XML

Databases: Teradata 15/14/13, Oracle 9i/10g, MySQL 5.0, MS SQL Server

Methodologies: Agile, Waterfall, Spiral model, Full lifecycle SDLC

Operating Systems: Windows, Linux and Unix

PROFESSIONAL EXPERIENCE

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Configured Apache Hadoop clusters for application development and Hadoop tools: Hive, Pig, HBase, Zookeeper and Sqoop.
  • Developed shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HBASE/HDFS for further analysis.
  • Collected the logs data from web servers and integrated with HBASE using Flume.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created Hive tables, data loading and developed Hive UDFs
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Gained knowledge on building Apache Spark applications using Scala
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
  • Created Hive External tables on the existing HDFS file systems.
  • Developed shell scripts for rolling day-to-day processes and automation
  • Developed POC for Apache Kafka
  • Automated workflows using shell scripts pull data from various databases into Hadoop.
  • Developed scalable, Hadoop-based data processing algorithms using MapReduce, Pig, Hive, HBase and the Hadoop ecosystem
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Setup QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Transform massive amounts of raw data into actionable analytics
  • Developed scripts to automate the process and generate reports.
  • Installed, optimized and configured new servers and application upgrades in existing network environment to meet the requirements.
  • Provided User training and support.

Environment: Hadoop, MapReduce, Spark, Java, Hive, HDFS, PIG, Sqoop, Kafka, Oozie, Flume, HBase, ZooKeeper, CDH4&CDH5, Oracle, Perl, PL/SQL, Python, Linux.

Confidential, Dayton, OH

Hadoop Developer

Responsibilities:

  • Developed POC’s for Hadoop implementation.
  • Configured Hadoop clusters on AWS.
  • Developed shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Used Sqoop to import data from Teradata to HDFS.
  • Used HDFS commands to move data from local system to HDFS.
  • Developed MapReduce Programs for parsing the raw data and populating staging tables using Java.
  • Used Pig & Python scripting for preprocessing the data.
  • Created staging tables for data transfer from Hive.
  • Developed and executed Hive Queries for deformalizing the data.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Created Hive External tables on the existing HDFS file systems.
  • Installed and configured Hive and also written Hive UDFs & Queries.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Created Partitions and Buckets on Hive tables.
  • Used Python for pattern matching in build logs to format errors and warnings.
  • Managing and reviewing Hadoop log files.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig Latin Scripts.
  • Performed joins, group by and other operations in MapReduce.
  • Developed shell scripts for rolling day-to-day processes and it is automated
  • Automated workflows using shell scripts pull data from various databases into Hadoop.

Environment: AWS - EC2&S3, Redshift, Horton Works, Teradata, Informatica Power Center 9.1, Power Center Bigdata Edition, Python, HDFS, Hive, PIG, Sqoop, Oozie, Impala, ZooKeeper, Maven, Whirr, XML, Linux.

Confidential

Java Developer

Responsibilities:

  • Developed UI, presentation layer using JSF Framework, HTML5, JQuery, JavaScript, Ext JS and CSS
  • Used JDBC to communicate with Oracle 10g database
  • Extensively used Hibernate in developing data access layer. Developed SQL queries, views and stored procedures using PL/SQL
  • Implemented Service Oriented Architecture by developing Java web services using WSDL, UDDI and SOAP
  • Performance tuned Sybase Database, created DB tables, stored procedures and indexes
  • Used CronTab (Scheduler) to run the Batch Jobs
  • Used Clear case for the concurrent development in the team and for code repository
  • Lead daily work transfer between onshore and offshore teams
  • Worked closely with the demanding client base to ensure that the solutions meet the requirements
  • Created dynamic HTML pages, used jQuery for client-side validations, and AJAX to create interactive frontend GUI
  • Developed reporting application using Core Java, JSP, Servlets, Spring Framework, SOAP, XML, JavaScript and Tomcat
  • Used Maven as build tool for managing a project's build, reporting and documentation from a central piece of information
  • Used SVN version control to track and maintain the different versions of the project
  • Requirements gathering, analysis, design, development, testing and Maintenance phases of R&D Redesign

Environment: JSP, Struts, Hibernate, Sybase ASE 12.5, Oracle 9i, Oracle 10g, PL/SQL, Cron Tab, Mongo DB, Junit, ASP, eclipse, JavaScript, XML, HTML, WSDL, SOAP.

Confidential

Java Developer

Responsibilities:

  • Analysis and design of the application.
  • Prepared the detailed design document to meet the requirements.
  • Developed the application using J2EE architecture.
  • Developed JSP forms.
  • Designed and developed web pages using HTML and JSP.
  • Designed and developed Servlets to communicate between presentation and business layer.
  • Used EJB as a middleware in developing a three-tier distributed application.
  • Developed Session Beans and Entity beans to business and data process.
  • Used JMS in the project for sending and receiving the messages on the queue.
  • Developed the Servlets for processing the data on the server.
  • The processed data is transferred to the database through Entity Bean.
  • Used JDBC for database connectivity with MySQL Server.
  • Used CVS for version control.
  • Unit testing using JUnit.

Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000

We'd love your feedback!