Big Data application Hadoop Developer Resume

PROFESSIONAL SUMMARY:

Twelve years of IT experience in System Analysis, Design, Development, Testing, Implementation and Production support of Data warehousing applications.
Expertise in Data Modeling, Data Analysis, Data Profiling, Data Extraction, Data Transformation and Data Loading.
Nine plus years of experience in ETL & Business Intelligence using IBM Info Sphere Data Stage /7.x (Parallel Extender and Server), Informatica Power Center 9.x/8.x, IDQ and IBM Cognos Report.
Three years of experience in Big Data platform using Apache Hadoop and its ecosystem.
Expertise in ingestion, storage, querying, processing and analysis of Big data.
Experienced in using Pig, Hive, Sqoop, Oozie, Flume, HBase and Hcatalog.
Good experience with teh Hive Query optimization and Performance tuning.
Hands on experience in writing Pig Latin Scripts and custom implementations using UDF'S.
Good experience with Sqoop for importing data from different RDBMS to HDFS and export data back to RDBMS systems for ad - hoc Reporting.
Experienced in batch job workflow scheduling and monitoring tools like Oozie.
Extended Hive and Pig core functionality by writing custom UDFs.
Experienced in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
Strong experience in building Dimension and Fact tables for Star Schema for various databases Oracle 11g/10g/9i/8i, Teradata v2r6 / v2r12, IBM DB2 UDB 9.1/9.7, MS SQL Server 8.0/9.0.
Hands on experience in managing and supporting large Data Warehouse applications, Developing and Tuning of PL/SQL scripts, complex Data Stage ETL routines, BTEQ, FASTLOAD, MLOAD scripts.
Experienced with various scheduling tools like Autosys, Control-M and Maestro Job Scheduler.
Experience in designing both logical and physical data models for large-scale data warehouse implementations using Data Modeling Tool - Erwin.
Established best ETL standards, Design ETL framework, reusable routines and validating data.
Expertise in all phases of Software development Life cycle (SDLC) - Project Requirement Analysis, Design, Development, Unit Testing, User Acceptance Testing, Implementation, Post implementation Support and Maintenance.
Strong functional noledge of Master Data management (MDM), Data Quality, Data Profiling and Meta Data Management life cycle.
Experience with UNIX, Linux scripting to automate teh jobs and ETL routines.
Developed ad-hoc Reports using IBM Cognos Report Author and Siebel Analytics.
Good Working noledge of writing Python scripts.
Worked as technical team lead/off shore coordinator in leading multiple projects for timely deliverables.

TECHNICAL SKILLS

Hadoop: HDFS, Map Reduce, HDFS, Oozie, Hive, Pig, Sqoop, HCatalog, Flume and HBase

Data warehouse Tools: Data Stage 9.1/7.5( Parallel and Server edition), Informatica Power Center 9.x/8.6.3/8.6.0/7.1.3/ 6.2/6.1, IDQ (Informatica Developer), IDE (Informatica Analyst), IBM Data Stage 7.3.1, IBM Cognos 8.4 BI (Report Author, Framework Manager), Informatica Metadata Manager(Administrator, Analyzer), SAP Analytics Web 7.8(OBIEE)

Operating Environment: HP-UNIX, IBM AIX 5.2, IBM Mainframes, MS Windows 95/98/NT/2000/XP, Sun Solaris.

RDBMS Tools: TOAD 7.x/8.5, PL/SQL Developer

Databases: Oracle 11g/10g/9i/8i/7.x, Teradata V2R5, DB2, SQL server 2000, DB2

Languages: SQL, PL/SQL, BTEQ, C, C++, SAS 8.0, ABAP 4.3C,JAVA

Scripting: Unix, Linux shell scripting

Scheduling Tools: Autosys, Tivoli (Maestro), Control-M

Data Modeling: Erwin 4.0 and Visio 2007

Tools/Utilities: SQL*Plus, TOAD, Teradata SQL Assistant 6.1, Multiload, Fastload, BTEQ Win, SQL*Loader

Other Tools: HP-Quality Center, PVCS Serena Manager. Visio 2007

XML: XML,HTML,DTD, XML Schema

Methodologies: Agile, UML, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Big Data application Hadoop Developer

Role and Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Involved in requirement analysis, design, coding and implementation.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
Developed Sqoop scripts to import/export data from relational sources and handled incremental loading on teh customer, transaction data by date.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running teh scripts in parallel to reduce run-time of teh scripts.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
Developed job flows in Oozie to automate teh workflow for extraction of data from warehouses and weblogs.
Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing teh data onto HDFS.
Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
Developed Pig routines for data cleansing and preprocessing.
Used Multiple Outputs class in Map Reduce jobs to name teh output files.
Created sequence files to store data as binary format using Map Reduce program.
Used different file formats like Text files, Sequence Files, JSON and Avro.
Worked on shell scripting.

Environment: Hadoop, PIG, HIVE, Sqoop, Flume, Oozie, Java (Jdk 1.6), Eclipse, HBase, LINUX, UNIX Shell Scripting.

Confidential

ETL Lead/Data Stage Developer

Role and Responsibilities:

Involved in designing teh ODS data model as per ACCORD model and DW architecture using Star Schema. Identifying teh Fact, Dimension, Junk and Bridge dimension tables.
Involved in designing Logical and physical models using ERWIN.
Analyzed functional requirements. Designed and implemented ETL framework using Data Stage.
Designed, developed teh Data Stage jobs, job sequences and Shared Containers to implement teh business requirements and ETL data flows diagrams.
Implemented CDL (Change Detection logic) & IL( Incremental Load) pattern by writing complex SQL scripts to balance teh load between Oracle and Data Stage server.
Lead teh team of 5 on-site and offshore developers by providing technical solutions, and managing offshore developer’s tasks on daily basis.
Created reusable objects such as containers, routines to handle teh reusability of code.
Re-designing teh ETL load flow by eliminating teh intermediate steps in Data Stage to get significant ETL performance gains.
Involved in setting up teh best ETL standards and halped teh client to use ETL best practices.
Designed Exception error logging and Audit logging framework using Data Stage routines.
Wrote ETL technical specification document, and developed Common ETL Project Templates.

Environment: Data Stage 9.1/7.5.1, Windows, Mainframe, Web Focus, Erwin, PL/SQL developer, Oracle 11g.

Confidential

Sr. Informatica Developer

Role and Responsibilities:

Worked closely with Business users and Project Manager to get Business Requirements.
Created, documented all project deliverables as per GE policies.
Created TDD (Technical Design document) and established best practices.
Designed, developed ETL routines to extract flat file data and load into Proficy Scheduler.
Performed Unit, Integration, System Test and UAT.
Developed Error handling routines to handle file transfer failure and data recovery from failure.
Responsible for completing code migration, production readiness review to ensure complete and accurate migration to various environments.

Environment: Informatica Power Center 8.1.1, HPUX 11.31, Proficy Scheduler, ERP.

Confidential

GSr. Informatica Developer

Role and Responsibility:

Involved in all Phases of SDLC including Requirement, Analysis, Design, Development and implementation phases.
Participated in design, code reviews and implementation of best ETL methodologies.
Analyzed Business Requirements and translated them into Technical Specifications and ETL data mapping documents.
Facilitated meetings with Business users and Business Analyst to resolve teh functional gaps.
Designed and developed ETL framework for teh loading of Data Warehouse.
Estimated data size and data growth and determined teh space for DW.
Designing Aggregation, Indexing and partitioning strategies for teh warehouse.
Involved in Data quality, Data profiling and recommended best data quality measures.
Developed ETL routines, UNIX shell wrappers to perform FTP and run ETL batch jobs.
Facilitated design sessions with ETL governance team for implementing best practices.
Participated actively in end-to-end testing includes System Integration Testing, User acceptance Testing, Pre-production and Post-production activities.

Environment: Informatica Power Center 8.6.0, Oracle 10.3.0.2, HPUX 11.31, MS Office, HP-Quality center, PVCS Serena,Harvest CM Workbench7.1.123, Control-M, CMFast, SAP-BO.

Confidential,Cincinnati,OH

Sr. Informatica Developer

Role and Responsibility:

Extensively worked on Migration strategy, Execution, Testing and Implementation of teh project.
Identified dependency Unix/Maestro scripts and their migration process.
Prepared migration plan for Unit Test, System Test, Performance Test and UAT.
Prepared inventory of Maestro jobs, converted and loaded into TIDAL scheduler.
Identified and converted FTP scripts to SFTP scripts.
Mentored 3 members team of ETL developers and testers and conducted code reviews.
Interacted closely with EES application support team to get resolves teh conflicts/technical gaps.
Prepared and validated Test Scenarios and Test cases with EES Support team.
Analyzed teh current state of all EES applications in AS-IS environment and plan for migrating/upgrading current environment to LINUX environment with minimal changes.
Validated Development, QA and Production Environments as per functional requirements.

Environment: Informatica Power Center 7.1.3, SAP,Oracle 9i, SQL, Microsoft-Visio,Unix-HP, Windows XP, HP-Quality center, PVCS Serena, Documentum, Tivoli Job scheduling tool.

Confidential,Cincinnati,OH

Sr. Informatica Developer

Role and Responsibility:

Analyzed Business/Functional specifications and translate them to technical specifications.
Closely interact with Business users to complete data mappings specs between source and target.
Established best practices for data movement, exceptional error handling, data recovery between source databases to target database.
Identified numerous gaps between AS-IS and TO-BE systems and halped Users to resolve them.
Designed, developed and implemented complex ETL routines using Informatica and Shell scripts.
Participated in testing procedures, test strategy, test plans for System and User acceptance testing.
Interacted with QA team to resolve teh issues timely manner and meet deliverables.
Written Unix Shell scripts to automate batch load jobs in Maestro.

Environment: Informatica Power Center 7.1.1, SAP,Oracle 9i, Microsoft-Visio,Unix-HP, Windows XP, HP-Quality center, PVCS Serena, Documentum, Tivoli Job scheduling tool.

Confidential, CA

Datawarehouse Informatica Developer

Role and Responsibility:

Automated manually generated Ad-hoc Reports using Informatica Mappings.
Performed Administration tasks like Creating Repositories, Users, and Assigning privileges.
Created technical design documents, as required, used to define ETL mappings.
Designed, Developed and Tested Workflows/Worklets according to Business Process Flow.
Converted existing Stored Procedures to ETL Mappings.
Worked on Siebel Answers 7.7 to generate various ad-hoc reports.

Environment: Informatica 7.1.1, Siebel Analytics 7.7.2,SQL server 2000, SQL, DTS, Windows XP.

Confidential, Santa Clara, CA

Encover Informatica Developer

Role and Responsibility:

Analyzed Functional specs and prepared Technical design docs.
Developed interface Mappings and Mapplets as per business logic.
Written ABAP programs in Source Qualifier to retrieve data from SAP system in real-time.
Developed Unix Shell Scripts to automate teh ETL Batch process by using Meastro.

Environment: Informatica 6.1, PowerConnect, Oracle 9i, UNIX, Windows XP, SAP R/3, ABAP, Meastro.

Confidential, Akron, OH

Datawarehouse Informatica Developer

Role and Responsibility:

Designed and developed ETL routines, Mapplets, reusable transformations and Mappings.
Desigend, developed, implemented, tested and validated complex Mappings/Workflows.
Extensively used ETL to load data from Oracle 9i, SAP R/3 and flat files to Teradata.
Developed, enhanced and validated BTEQ scripts to load teh data into Teradatabase EDW.
Identified teh bottlenecks. Tuned and optimized ETL processes for better performance.

Environment: menformatica 6.2, Teradata 6.0, BTEQ, Oracle 9i, VSAM, UNIX, Windows XP, Autosys 4.0.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship