We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Tampa, FL

SUMMARY

  • A dynamic, skilled professional with over 9 years’ experience in Data warehousing domain.
  • Have 5 years of working experience on Teradata.
  • Have 3 years of comprehensive experience in Big Data Analytics.
  • Good knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts.
  • Experience in using Hive, Sqoop and Cloudera Manager.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Extending Hive functionality by writing custom UDFs.
  • Experience in analyzing data using HiveQL, Pig Latin and Map Reduce.
  • Knowledge in job work-flow scheduling and monitoring tools like Oozie.
  • Expertise in writing SQL queries using Teradata.
  • Worked extensively on Teradata Utilities like MLOAD, TPUMP, FASTLOAD, FASTEXPORT, SQL queries and loading data into Data Warehouse/Data Marts.
  • Secondary skill set includes Datastage.
  • Extensive experience in ETL Analysis, Design, Development, Testing, Implementation, Maintaining Standards, Quality Audits, Performance Tuning, Automation of jobs and Maintenance and support of various applications.
  • Have good skills in UNIX shell scripting and PL/SQL.
  • Hands on experience in RDBMS, and Linux shell scripting
  • Have good understanding of Mainframes concepts.
  • Strong understanding of the Data Warehousing Techniques.
  • Exceptional abilities by performing Unit testing, Quality Assurance testing, System integration testing, Regression testing, Reconciliations etc.,
  • Have prepared a neat and detailed HLD, LLD and Run books documents.
  • Excellent skills in a wide variety of technologies and a proven ability to quickly learn new programs and tools.
  • Very good communication skills and quick adaptability to new technologies and new working environment.
  • Excellent organizational skills and ability to prioritize workload.
  • Imparted various training sessions on DW Basics & Datastage to Entry level trainees.

TECHNICAL SKILLS

Business Areas: Banking and Financial services, Telecom services

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume

Data warehousing: Mainframes, Data stage

Operating System: MSDOS, Windows (9X, 2000, XP), Unix

Databases: Teradata, Oracle, SQL Server

Tools: Autosys, JIRA, PL/SQL Developer

Languages: UNIX Shell Scripts, PL/SQL

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Developed Map Reduce programs using combiner and custom partition to parse raw data, populate staging tables and load refined data into partitioned tables for all the domains.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Tested raw data and executed performance scripts.
  • Written Hive queries for analyzing and reporting purposes of different streams in the company.
  • Supported code/design analysis, strategy development and project planning.
  • Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Developed SQL statements to improve back end communications.
  • Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team

Environment: Hadoop, Map Reduce, Hive, Sqoop, Pig and UNIX Shell Scripting.

Confidential, Richmond, VA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing
  • Involved in data extraction from distributed RDBMS like Teradata.
  • Involved in loading data from UNIX file system to HDFS.
  • Used Map Reduce JUnit for unit testing.
  • Troubleshooting the cluster by reviewing Hadoop LOG files. Involved in managing and reviewing Hadoop log files.
  • Installed and configured Hive for ETL jobs.
  • Used Oozie to manage the Hadoop jobs.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used CDH3 and CDH4 distributions for development and deployment.
  • Imported data using Sqoop from Teradata using Teradata connector.
  • Implemented Partitioning, Dynamic Partitioning, and Bucketing in HIVE.
  • Involved in maintaining various Unix Shell scripts.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL to pushing the result set Data to Hadoop Distributed File System using Sqoop.
  • Used SVN for version control.
  • Maintain System integrity of all subcomponents (primarily HDFS, MR, HBase, and Flume)
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop, Map Reduce, Hive, Sqoop, HBase (NoSQL database), Java 1.6, and UNIX Shell Scripting.

Confidential

Responsibilities:

  • Work on analyzing and writing Map Reduce jobs based on the requirements.
  • Involve in creating Hive tables, loading with data and writing hive queries that will internally run as Map Reduce job.
  • Work on Flume to stream Twitter data in to HDFS to analyze Black Friday promotions.
  • Work on creating Hive scripts for data analysts based on the Ad-hoc requirements.
  • Involve in writing UDF for Hive.
  • Schedule the workflow using Oozie workflow scheduler.
  • Extract the data from external data sources into HDFS using Sqoop.
  • Develop Map Reduce programs using combiner and custom partition.
  • Create Sqoop job to populate Hive external tables.
  • Involve in configuring Flume to import Sofi and DGW logs to HDFS.
  • Using Cloudera Manager for monitoring purpose.
  • Involve in loading data from Linux file system to HDFS
  • Involve in all the phases of the SDLC using Agile Scrum methodology.
  • Working in Agile environment with 3 weeks sprint, involving in grooming meeting, sprint planning meeting, retrospective meeting and daily stand up meeting.
  • Using Version One tool for tracking all agile related activities like stories, tasks, estimated story points and actual burn hours.

Environment: Hadoop, Map Reduce, HDFS, Hive, Sqoop, Oozie, Cloudera Manager, Flume

Confidential

Responsibilities:

  • The objective of this project is to increase the customer data security as per PCI compliance.
  • Work on remediation of Plastic number column.
  • Analyze shell scripts and makes changes to remediate plastic number.
  • Perform dual validation to make sure test data exactly matches with production data except plastic number column.
  • Identify queries which are consuming more CPU and tuned those.
  • Prepare Unit Test Cases and Unit Test Results.
  • Utilize UNIX and Teradata technologies to work on shield remediation project.
  • Work with process owners and business users to get to understand the each script goal.
  • Work with testing team to understand the validation requirements.
  • Promote code into production and support them in case of any failure.
  • Work on creating DDL’s for the new structures by identifying the proper index.
  • Bteq export utility to generate Excel reports for users.

Confidential, Charlotte, NC

Hadoop/Teradata Developer

Responsibilities:

  • Gather business requirements from the Business Partners.
  • Loading files to HDFS and writing HIVE queries to process required data.
  • Worked on setting up Hadoop over multiple nodes, designing and developing MapReduce.
  • Involved in installing Hadoop Ecosystem components.
  • Support Map Reduce Programs those are running on the cluster.
  • Involved in HDFS maintenance and loading structured and unstructured data.
  • Worked as Team Member for Statement Module.
  • Involved in integration and unit testing.
  • Client interaction (interaction with clients for requirements)
  • Import data using Sqoop to load data from Teradata to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Created Hive tables and worked using Hive QL.

Confidential

Senior Teradata Developer

Responsibilities:

  • Part of design team and production Support team for the migration project.
  • Part of a design team for design of STAR schema for data warehouse project.
  • Interacted with the End users / Customers for Creating Mapping documents.
  • Created Mapping documents for Migration project.
  • Done extensive business analysis to analyze the source system and talking to the business groups to understand the reporting requirements.
  • Designed the mapping documents between source databases and target databases.
  • Worked on critical Occurs and Redefines in the complex flat file structures.
  • Done data analysis, quality analysis, and data loading.
  • Developing processes for extracting, cleansing, transforming, integrating and loading data into databases.
  • Used SQL server as source to load data into target database which is in Teradata.
  • Created extract processes, analyzing the data, DB2 code to pull the required data.
  • Developed many Datastage server jobs for data processing and loading of data.
  • Used TOAD tool for the analysis part.
  • Used Autosys for Scheduling of the Jobs

Confidential

Responsibilities:

  • Worked with the Business analysts and the DBAs for requirements gathering, analysis, testing, and metrics and project coordination.
  • Successfully handled the slowly changing dimensions.
  • Involved in the Dimensional modeling of the Data warehouse.
  • Developed documents like Source to Target mapping for developing the ETL jobs.
  • Worked with DataStage server stages like OCI, ODBC, Transformer, Hash file, Sequential file, Aggregator, Sort, Merge, and other stages.
  • Imported the required Metadata from heterogeneous sources at the project level.
  • Involved in designing various jobs using PX.
  • Extensive worked with jobs export, jobs import and multi job compilation etc.
  • Developed Parallel jobs using Parallel stages like: Merge, Join, Lookup, Transformer (Parallel), Teradata Enterprise Stage, Funnel, Dataset.
  • Performed debugging on these jobs using Peek stage by outputting the data to Job Log or a stage.
  • Used Remove Duplicates stage to remove the duplicates in the data.
  • Involved in the migration of DataStage jobs from Development to Production environment.
  • Worked on implementation job performance tuning techniques.
  • Designed and implemented several wrappers to execute the DataStage jobs, create job reports out of the DataStage job execution results from shell scripts.
  • Designed and implemented wrappers to execute the DataStage jobs from remote servers.
  • Worked on database connections, SQL joins, views, aggregate conditions, parsing of objects and hierarchies.
  • Tuned SQL queries for better performance for processing business logic in the database.

Confidential

Responsibilities:

  • Leading the projects from Onsite/Offshore which includes a daily interaction with clients and Business Analysts for requirements gathering, clarifications and suggestions in data integration.
  • Design and development SQL queries as per the business requirement.
  • Created a RTAS (Rules, Trends, Alerts system) generic data validation driven by control tables which has been made as standard for data validation across the project.
  • Preparing Unit Test Cases and Unit Test Results.
  • Coordinating with testing team for fixing QA and UAT Defects.
  • Scheduling the jobs in Autosys and monitoring its daily run in Production.
  • Preparing all the necessary documents like Run book, Low Level Design, Deployment Sheets, Delivery packages.
  • Creating/Updating eProducts data model using Sybase Power designer.

Confidential

Responsibilities:

  • Helped V-MIS in preparing usage inventory document.
  • Worked on Informatica Power Center tool - Source Analyzer, Warehouse designer, Mapping and Mapplet Designer, Transformations, Informatica Repository Manager and Informatica Server Manager.
  • Informatica Metadata repository was created using the Repository Manager as a hub for interaction between the various tools. Security and user management, repository backup was also done using the same tool.
  • Informatica Designer tools were used to design the source definition, target definition and transformations to build mappings.
  • Created the mappings using transformations such as the Source qualifier, Aggregator, Expression, Lookup, Router, Filter, and Update Strategy.
  • Server Manager used for creating and maintaining the Sessions. Server Manger Also used to Monitor, edit, schedule, copy, aborts and deletes the session.
  • Used ETL for efficient mapping and transformation techniques.

Hire Now