Hadoop Developer Resume
Austin, TX
SUMMARY
- 7+ years of IT Experience in systems development, databases & analytics, with 2+ years of strong experience as Hadoop Developer.
- Expertise in Big data, Hadoop, NoSQL and various components such as HDFS, MR2, YARN, Spark, PIG, Hive, Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, Kafka, Hue, CDH5, & HDP 2.x. Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive, & Pig.
- Working experience on Cloudera, Horton Works Hadoop distribution.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experience in writing UDFs in Hive and Pig.
- Experience in Hive Partitioning and Bucketing.
- Experience in importing and exporting data using Sqoop to HDFS from Relational Database Systems.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Good exposure with NoSQL Data bases: HBase.
- Experience in AWS - S3, EC2, Redshift.
- Experience in different Hadoop distributions like Cloudera, Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
- Used HiveQL to do analysis on the data and identify different correlations
- Strong experience in J2EE, JSP, Servlets, Struts, Spring, Hibernate, JDBC, SOAP, WSDL, JSON, JQuery, Java Script, CSS and HTML, Java Multithreading, Exception Handling.
- Developed applications using Spring Framework and implemented spring modules like core container module, application context module, Aspect oriented module (AOP Module), JDBC Module, ORM Module and web module
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
- Developed MapReduce programs in Python with the Hadoop streaming API.
- Good Knowledge of Data Profiling using Informatica Data Explorer.
- Extensive experience in ETL Design and Development.
- Good Project Management Knowledge Areas and Process groups.
- Experience working in an iterative, agile software lifecycle (SDLC) with strong ability to estimate/scope the development of projects.
- Well versed in OLTP Data Modeling and Strong knowledge of Entity-Relationship concepts.
- Experience in Data Cleaning and Data Preprocessing usingPython Scripting.
- Strong in RDBMS Databases, PL/SQL programming.
- Strong experience in Oracle SQL queries, PL/SQL Stored Procedures, Functions, Packages, Triggers and Cursors with Query optimizations as part of ETL Development process.
- Knowledge on Handling Hive queries using Spark SQL that integrate Spark environment.
- Hands on experience of UNIX and shell scripting to automate scripts.
- Good experience on FileZilla and WinScp tools for transferring files to UNIX environments.
TECHNICAL SKILLS
Big Data Technologies: Hadoop 2.7.x/2.5.x/2.4.x/1.x.y, HDFS, MapReduce, Sqoop 1.4.x, Oozie, Pig 0.15/0.14/0.11 , Hive 1.2.1/0.14/0.13/0.10 , ZooKeeper, Impala, Hue, Flume, HBase, Spark.
Programming: Python 3.x/2.x, Java 1.7/1.6, C and PL/SQL
ETL/BI Tools: Informatica Power Center 9.x/8.6, OBIEE
Script/Markup: JavaScript, XML, HTML, JSON and Unix Scripting
IDE: Eclipse, Rational Web Application Developer, NetBeans
App/Web Servers: Apache Tomcat Server, Apache / IBM HTTP Server, WebSphere Application Server 6.1/7.0
Messaging & Web Services: SOAP, REST, WSDL, UDDI, JMS and XML
Databases: Teradata 15/14/13, Oracle 9i/10g, MySQL 5.0, MS SQL Server
Methodologies: Agile, Waterfall, Spiral model, Full lifecycle SDLC
Operating Systems: Windows, Linux and Unix
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Configured Apache Hadoop clusters for application development and Hadoop tools: Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HBASE/HDFS for further analysis.
- Collected the logs data from web servers and integrated with HBASE using Flume.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables, data loading and developed Hive UDFs
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Gained knowledge on building Apache Spark applications using Scala
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
- Created Hive External tables on the existing HDFS file systems.
- Developed shell scripts for rolling day-to-day processes and automation
- Developed POC for Apache Kafka
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Developed scalable, Hadoop-based data processing algorithms using MapReduce, Pig, Hive, HBase and the Hadoop ecosystem
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Setup QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Transform massive amounts of raw data into actionable analytics
- Developed scripts to automate the process and generate reports.
- Installed, optimized and configured new servers and application upgrades in existing network environment to meet the requirements.
- Provided User training and support.
Environment: Hadoop, MapReduce, Spark, Java, Hive, HDFS, PIG, Sqoop, Kafka, Oozie, Flume, HBase, ZooKeeper, CDH4&CDH5, Oracle, Perl, PL/SQL, Python, Linux.
Confidential, Dayton, OH
Hadoop Developer
Responsibilities:
- Developed POC’s for Hadoop implementation.
- Configured Hadoop clusters on AWS.
- Developed shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Used Sqoop to import data from Teradata to HDFS.
- Used HDFS commands to move data from local system to HDFS.
- Developed MapReduce Programs for parsing the raw data and populating staging tables using Java.
- Used Pig & Python scripting for preprocessing the data.
- Created staging tables for data transfer from Hive.
- Developed and executed Hive Queries for deformalizing the data.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Created Hive External tables on the existing HDFS file systems.
- Installed and configured Hive and also written Hive UDFs & Queries.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
- Created Partitions and Buckets on Hive tables.
- Used Python for pattern matching in build logs to format errors and warnings.
- Managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig Latin Scripts.
- Performed joins, group by and other operations in MapReduce.
- Developed shell scripts for rolling day-to-day processes and it is automated
- Automated workflows using shell scripts pull data from various databases into Hadoop.
Environment: AWS - EC2&S3, Redshift, Horton Works, Teradata, Informatica Power Center 9.1, Power Center Bigdata Edition, Python, HDFS, Hive, PIG, Sqoop, Oozie, Impala, ZooKeeper, Maven, Whirr, XML, Linux.
Confidential
Java Developer
Responsibilities:
- Developed UI, presentation layer using JSF Framework, HTML5, JQuery, JavaScript, Ext JS and CSS
- Used JDBC to communicate with Oracle 10g database
- Extensively used Hibernate in developing data access layer. Developed SQL queries, views and stored procedures using PL/SQL
- Implemented Service Oriented Architecture by developing Java web services using WSDL, UDDI and SOAP
- Performance tuned Sybase Database, created DB tables, stored procedures and indexes
- Used CronTab (Scheduler) to run the Batch Jobs
- Used Clear case for the concurrent development in the team and for code repository
- Lead daily work transfer between onshore and offshore teams
- Worked closely with the demanding client base to ensure that the solutions meet the requirements
- Created dynamic HTML pages, used jQuery for client-side validations, and AJAX to create interactive frontend GUI
- Developed reporting application using Core Java, JSP, Servlets, Spring Framework, SOAP, XML, JavaScript and Tomcat
- Used Maven as build tool for managing a project's build, reporting and documentation from a central piece of information
- Used SVN version control to track and maintain the different versions of the project
- Requirements gathering, analysis, design, development, testing and Maintenance phases of R&D Redesign
Environment: JSP, Struts, Hibernate, Sybase ASE 12.5, Oracle 9i, Oracle 10g, PL/SQL, Cron Tab, Mongo DB, Junit, ASP, eclipse, JavaScript, XML, HTML, WSDL, SOAP.
Confidential
Java Developer
Responsibilities:
- Analysis and design of the application.
- Prepared the detailed design document to meet the requirements.
- Developed the application using J2EE architecture.
- Developed JSP forms.
- Designed and developed web pages using HTML and JSP.
- Designed and developed Servlets to communicate between presentation and business layer.
- Used EJB as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans to business and data process.
- Used JMS in the project for sending and receiving the messages on the queue.
- Developed the Servlets for processing the data on the server.
- The processed data is transferred to the database through Entity Bean.
- Used JDBC for database connectivity with MySQL Server.
- Used CVS for version control.
- Unit testing using JUnit.
Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000
