We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Overland Park, KS

SUMMARY:

  • IT Professional with over 9+ years of Experience in systems development, databases & analytics, with 3+ years of comprehensive experience as Hadoop Developer.
  • Expertise in Big Data, Hadoop, SQL, PL/SQL, ETL and various frameworks in Hadoop such as Spark, Pig, Hive, Sqoop, Flume, Zoo keeper, Oozie, Hue, Kafka, HBase, HDFS, MR2, YARN, Cloudera Manager, CDH5, & HDP 2.x.
  • Expertise in Hadoop ecosystem (writing Hadoop Jobs for analyzing data using MapReduce, Hive, & Pig.).
  • Having a Strong working experience on Cloudera in Hadoop distribution.
  • Extensive working experience on Hadoop eco - system components like Hive, Pig, Sqoop, Flume, Oozie.
  • Experience in Design/ Development and Implementation of Big Data Application.
  • Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Transformed date related data into application compatible format by developing apache Pig UDFs.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level.
  • Expert in working with Hive data warehouse tool creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Experience in importing and exporting data using Sqoop to HDFS from Relational Database Systems.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Hands on Experience in installing, configuring and maintaining the Hadoop clusters.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Good understanding of NoSQL Data bases like HBase.
  • Experience in AWS - S3, EC2, Redshift.
  • Used HiveQL to do analysis on the data and identify different correlations
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
  • Written MapReduce programs in Python with the Hadoop streaming API.
  • Good Knowledge of Data Profiling using Informatica Data Explorer.
  • Extensive experience in building ETL Design and Development.
  • Good understanding of Project Management Knowledge Areas and Process groups.
  • Well versed in OLTP Data Modeling and Strong knowledge of Entity-Relationship concepts.
  • Experience in Data Cleaning and Data Preprocessing using Python Scripting.
  • Good experience in all the phases of Software Development Life Cycle (Analysis of requirements, Design, Development, Verification and Validation, Deployment).
  • Have strong Database knowledge on SQL, PL/SQL programming and RDBMS concepts.
  • Extensively involved in creating Oracle SQL queries, PL/SQL Stored Procedures, Functions, Packages, Triggers and Cursors with Query optimizations as part of ETL Development process.
  • Knowledge on Handling Hive queries using Spark SQL that integrate Spark environment.
  • Hands on experience of UNIX and shell scripting to automate scripts.
  • Ability to work effectively and efficiently in a team and individually with excellent interpersonal, technical and communication skills.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, ZooKeeper, Impala, Hue, Kafka, Flume, HBase, Spark.

Databases: Oracle SQL,PL/SQL, Teradata, MySQL 5.0, MS SQL Server

ETL/BI Tools: MSBI, Talend, Informatica Power Center 9.x/8.6, OBIEE

Programming: Python, Core Java 1.7/1.6, C

Script/Markup: JavaScript, XML, HTML, JSON and Unix Scripting

IDE: Eclipse, Rational Web Application Developer, NetBeans, TextPad

App/Web Servers: Apache Tomcat Server, Apache / IBM HTTP Server, WebSphere Application Server 6.1/7.0

Messaging & Web Services: SOAP, REST, WSDL, UDDI, JMS and XML

Methodologies: Agile, Waterfall model, Spiral model, SDLC

Operating Systems: Windows, Linux and Unix

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Overland Park, KS

Responsibilities:

  • Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase.
  • Involved in creating data lake by extracting customer's data from various data sources to HDFS which include data from csv, databases, and log data from servers.
  • Loaded data into Hive by using Sqoop and used Hive QL to analyze the partitioned and bucketed data, executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
  • Developed Spark applications by using Scala and Python and implemented Apache Spark for data processing from various streaming sources.
  • Importing data from SQL Server to HDFS using python based on Sqoop framework.
  • Exporting data from HDFS to MYSQL using python based on Hawq framework.
  • Developed java applications that parses the mainframe report and put into CSV Files and another application will compare the data from SQL server and mainframe report and generates a rip file.
  • Documented the technical design and also production support document.
  • Involved in creating workflow for the Tidal (Workflow coordinator for Confidential & Confidential ).
  • Created Hive external table with partitions and bucketing to load incremental data coming from SQL server.
  • Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms
  • Creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce
  • Responsible for performing extensive data validation using Hive
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access
  • Used Oozie workflow engine to run multiple Hive and Pig jobs
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Involved in designing and developing non trivial ETL processes within Hadoop using tools like Pig, Sqoop, Flume, and Oozie
  • Used DML statements to perform different operations on Hive Tables
  • Developed Hive queries for creating foundation tables from stage data
  • Used Pig as ETL tool to do Transformations, event joins, filter and some pre aggregations
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Developed Hive custom Java UDF’s for further transformations on data.
  • Done performance tuning in the hive at all point of phases.
  • Involved in modifying existing sqoop and Hawq frameworks to read json property file and perform load/unload operations from HDFS to MYSQL.
  • Developed Pig scripts to validate the count of data between Sqoop and Hawq loads.
  • Developed java Map Reduce custom counters to track the records that are processed by map reduce job.

Environment: PivotalHD3.0,Hawq,MYSQL8.4,SQLServer2014/2012,Java,Sqoop,Hadoop 2.6, Python, Tidal, Spring-XD, Pig, JSON.

Hadoop Developer

Confidential, Hoffman Estates, IL

Responsibilities:

  • Configured and implemented Flume for efficiently collecting, aggregating and moving large amounts of data to HDFS.
  • Developed Flume interceptors for preprocessing of application logs before they are loaded into HDFS.
  • Developed multiple Map Reduce programs for cleaning the data and preprocessing for downstream analytics.
  • Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
  • Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
  • Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
  • Used Compression techniques (Snappy, Gzip) to optimize the Map Reduce jobs to use HDFS efficiently.
  • Developed Hive Queries for data analysis by extending it features by writing Custom UDF’s and SerDe’s.
  • Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.
  • Involved in configuring the Solr index pipelines to enable real time indexing for our recommendation engine.
  • Bundled multiple independent jobs into runnable Oozie workflow, wrapping them together as one triggerable process.
  • Implemented POC to load data into Cassandra and access data using Java API.
  • Used Cassandra storage API’s to access, analyze and store data from/to the Cassandra data store.

Environment: Hadoop 2.0, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, Flume, Ubuntu, Java, Eclipse, XML,JSON, SerDe’s, Custom UDF’s, MR Unit, Cassandra.

Hadoop Developer

Confidential

Responsibilities:

  • Developed different MapReduce applications on Hadoop.
  • Mining the location of users on social media sites in semi supervised environment on Hadoop cluster using Map Reduce.
  • Implementing single source shortest path on Hadoop cluster.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implemented various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Estimated Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
  • Written the Map Reduce programs, Hive UDFs in Java where the functionality is too complex.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Involved in loading data from LINUX file system to HDFS.
  • Prepared design documents and functional documents.
  • Based on the requirements, addition of extra nodes to the cluster to make it scalable.
  • Developed HIVE queries for the analysis, to categorize different items.
  • Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Given POC of FLUME to handle the real time log processing for attribution reports.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).

Environment: Java, Apache Hadoop, HDFS, MapReduce, Pig, Hive, LINUX, Sqoop, Flume, Oozie. Cassandra.

Database and ETL Developer

Confidential

Responsibilities:

  • Designed and developed the interface program to Import item master using SQL*Loader, PL/SQL package and Import through Concurrent program.
  • Designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
  • Designed/modified/implemented stored procedures, triggers in Oracle 8.0.6 using PL/SQL.
  • Increased performance, speed, and error handling of process by 60%.
  • Analyzed and used Constellar hub to ETL source data for data warehousing
  • Designed, developed, coded, tested, documented, and implemented data modeling features using TOAD and other third party tools.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database and for Testing Purposes.
  • Created primary Database storage structures (Table Spaces, Segment, Extent, Data files, Data blocks) and objects (Tables, Views, and Indexes).
  • Oversaw troubleshooting UNIX macros, SQL scripts, and SQR reports.

Environment: Oracle 8i/9i/10g, SQL Server 2000,PL/SQL, SQL * Loader, TOAD, UNIX Shell Scripting, MS SQL server 2000, MSBI, Windows NT 4.0.

Database and ETL Developer

Confidential

Responsibilities:

  • Designing the ETL jobs using MSBI tool to load data from Mainframe to Oracle Database.
  • Developing Parallel jobs to load the data into the Target Schema.
  • Created sequence for running ETLs using new job sequencer, job activity, nested condition, notification activity and sequencer.
  • Used job sequence to create job sequencer, to call various parallel job and send message in case of process failures through the execution command.
  • Preparing the technical documents for ETLs.
  • Testing the Jobs and preparing the Unit Test Cases.

Environment: Data stage 7.5, ORACLE 9i, and Windows XP

Database Developer

Confidential

Responsibilities:

  • Developed Database Applications in MS Access and SQL Server accelerating insurance claims processing.
  • Proficient in implementing complex SQL/Access queries, Schema Designing, Normalization, and Performance Tuning.
  • Developed Processes for ETL.
  • Created technical documentation and documented processes on several projects.
  • Performed Database Backup, Recovery and Disaster Recovery procedures.
  • Performed data migration using DTS services across different databases, including MS Access, Excel and flat files.
  • Analyzed and design database as well as business logic modules
  • Processed and documented the project
  • Installed and configured relevant components to ensure database access

Environment: SQL Server, MSBI(SSIS), MS Access, Excel and Windows 2000 professional.

We'd love your feedback!