Big Data Application Hadoop Developer Resume
PROFESSIONAL SUMMARY:
- Twelve years of IT experience in System Analysis, Design, Development, Testing, Implementation and Production support of Data warehousing applications.
- Expertise in Data Modeling, Data Analysis, Data Profiling, Data Extraction, Data Transformation and Data Loading.
- Nine plus years of experience in ETL & Business Intelligence using IBM Info Sphere Data Stage /7.x (Parallel Extender and Server), Informatica Power Center 9.x/8.x, IDQ and IBM Cognos Report.
- Three years of experience in Big Data platform using Apache Hadoop and its ecosystem.
- Expertise in ingestion, storage, querying, processing and analysis of Big data.
- Experienced in using Pig, Hive, Sqoop, Oozie, Flume, HBase and Hcatalog.
- Good experience with teh Hive Query optimization and Performance tuning.
- Hands on experience in writing Pig Latin Scripts and custom implementations using UDF'S.
- Good experience with Sqoop for importing data from different RDBMS to HDFS and export data back to RDBMS systems for ad - hoc Reporting.
- Experienced in batch job workflow scheduling and monitoring tools like Oozie.
- Extended Hive and Pig core functionality by writing custom UDFs.
- Experienced in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Strong experience in building Dimension and Fact tables for Star Schema for various databases Oracle 11g/10g/9i/8i, Teradata v2r6 / v2r12, IBM DB2 UDB 9.1/9.7, MS SQL Server 8.0/9.0.
- Hands on experience in managing and supporting large Data Warehouse applications, Developing and Tuning of PL/SQL scripts, complex Data Stage ETL routines, BTEQ, FASTLOAD, MLOAD scripts.
- Experienced with various scheduling tools like Autosys, Control-M and Maestro Job Scheduler.
- Experience in designing both logical and physical data models for large-scale data warehouse implementations using Data Modeling Tool - Erwin.
- Established best ETL standards, Design ETL framework, reusable routines and validating data.
- Expertise in all phases of Software development Life cycle (SDLC) - Project Requirement Analysis, Design, Development, Unit Testing, User Acceptance Testing, Implementation, Post implementation Support and Maintenance.
- Strong functional noledge of Master Data management (MDM), Data Quality, Data Profiling and Meta Data Management life cycle.
- Experience with UNIX, Linux scripting to automate teh jobs and ETL routines.
- Developed ad-hoc Reports using IBM Cognos Report Author and Siebel Analytics.
- Good Working noledge of writing Python scripts.
- Worked as technical team lead/off shore coordinator in leading multiple projects for timely deliverables.
TECHNICAL SKILLS
Hadoop: HDFS, Map Reduce, HDFS, Oozie, Hive, Pig, Sqoop, HCatalog, Flume and HBase
Data warehouse Tools: Data Stage 9.1/7.5( Parallel and Server edition), Informatica Power Center 9.x/8.6.3/8.6.0/7.1.3/ 6.2/6.1, IDQ (Informatica Developer), IDE (Informatica Analyst), IBM Data Stage 7.3.1, IBM Cognos 8.4 BI (Report Author, Framework Manager), Informatica Metadata Manager(Administrator, Analyzer), SAP Analytics Web 7.8(OBIEE)
Operating Environment: HP-UNIX, IBM AIX 5.2, IBM Mainframes, MS Windows 95/98/NT/2000/XP, Sun Solaris.
RDBMS Tools: TOAD 7.x/8.5, PL/SQL Developer
Databases: Oracle 11g/10g/9i/8i/7.x, Teradata V2R5, DB2, SQL server 2000, DB2
Languages: SQL, PL/SQL, BTEQ, C, C++, SAS 8.0, ABAP 4.3C,JAVA
Scripting: Unix, Linux shell scripting
Scheduling Tools: Autosys, Tivoli (Maestro), Control-M
Data Modeling: Erwin 4.0 and Visio 2007
Tools/Utilities: SQL*Plus, TOAD, Teradata SQL Assistant 6.1, Multiload, Fastload, BTEQ Win, SQL*Loader
Other Tools: HP-Quality Center, PVCS Serena Manager. Visio 2007
XML: XML,HTML,DTD, XML Schema
Methodologies: Agile, UML, Waterfall
PROFESSIONAL EXPERIENCE
Confidential
Big Data application Hadoop Developer
Role and Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in requirement analysis, design, coding and implementation.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Developed Sqoop scripts to import/export data from relational sources and handled incremental loading on teh customer, transaction data by date.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running teh scripts in parallel to reduce run-time of teh scripts.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Developed job flows in Oozie to automate teh workflow for extraction of data from warehouses and weblogs.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing teh data onto HDFS.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Developed Pig routines for data cleansing and preprocessing.
- Used Multiple Outputs class in Map Reduce jobs to name teh output files.
- Created sequence files to store data as binary format using Map Reduce program.
- Used different file formats like Text files, Sequence Files, JSON and Avro.
- Worked on shell scripting.
Environment: Hadoop, PIG, HIVE, Sqoop, Flume, Oozie, Java (Jdk 1.6), Eclipse, HBase, LINUX, UNIX Shell Scripting.
Confidential
ETL Lead/Data Stage Developer
Role and Responsibilities:
- Involved in designing teh ODS data model as per ACCORD model and DW architecture using Star Schema. Identifying teh Fact, Dimension, Junk and Bridge dimension tables.
- Involved in designing Logical and physical models using ERWIN.
- Analyzed functional requirements. Designed and implemented ETL framework using Data Stage.
- Designed, developed teh Data Stage jobs, job sequences and Shared Containers to implement teh business requirements and ETL data flows diagrams.
- Implemented CDL (Change Detection logic) & IL( Incremental Load) pattern by writing complex SQL scripts to balance teh load between Oracle and Data Stage server.
- Lead teh team of 5 on-site and offshore developers by providing technical solutions, and managing offshore developer’s tasks on daily basis.
- Created reusable objects such as containers, routines to handle teh reusability of code.
- Re-designing teh ETL load flow by eliminating teh intermediate steps in Data Stage to get significant ETL performance gains.
- Involved in setting up teh best ETL standards and halped teh client to use ETL best practices.
- Designed Exception error logging and Audit logging framework using Data Stage routines.
- Wrote ETL technical specification document, and developed Common ETL Project Templates.
Environment: Data Stage 9.1/7.5.1, Windows, Mainframe, Web Focus, Erwin, PL/SQL developer, Oracle 11g.
Confidential
Sr. Informatica Developer
Role and Responsibilities:
- Worked closely with Business users and Project Manager to get Business Requirements.
- Created, documented all project deliverables as per GE policies.
- Created TDD (Technical Design document) and established best practices.
- Designed, developed ETL routines to extract flat file data and load into Proficy Scheduler.
- Performed Unit, Integration, System Test and UAT.
- Developed Error handling routines to handle file transfer failure and data recovery from failure.
- Responsible for completing code migration, production readiness review to ensure complete and accurate migration to various environments.
Environment: Informatica Power Center 8.1.1, HPUX 11.31, Proficy Scheduler, ERP.
Confidential
GSr. Informatica Developer
Role and Responsibility:
- Involved in all Phases of SDLC including Requirement, Analysis, Design, Development and implementation phases.
- Participated in design, code reviews and implementation of best ETL methodologies.
- Analyzed Business Requirements and translated them into Technical Specifications and ETL data mapping documents.
- Facilitated meetings with Business users and Business Analyst to resolve teh functional gaps.
- Designed and developed ETL framework for teh loading of Data Warehouse.
- Estimated data size and data growth and determined teh space for DW.
- Designing Aggregation, Indexing and partitioning strategies for teh warehouse.
- Involved in Data quality, Data profiling and recommended best data quality measures.
- Developed ETL routines, UNIX shell wrappers to perform FTP and run ETL batch jobs.
- Facilitated design sessions with ETL governance team for implementing best practices.
- Participated actively in end-to-end testing includes System Integration Testing, User acceptance Testing, Pre-production and Post-production activities.
Environment: Informatica Power Center 8.6.0, Oracle 10.3.0.2, HPUX 11.31, MS Office, HP-Quality center, PVCS Serena,Harvest CM Workbench7.1.123, Control-M, CMFast, SAP-BO.
Confidential,Cincinnati,OH
Sr. Informatica Developer
Role and Responsibility:
- Extensively worked on Migration strategy, Execution, Testing and Implementation of teh project.
- Identified dependency Unix/Maestro scripts and their migration process.
- Prepared migration plan for Unit Test, System Test, Performance Test and UAT.
- Prepared inventory of Maestro jobs, converted and loaded into TIDAL scheduler.
- Identified and converted FTP scripts to SFTP scripts.
- Mentored 3 members team of ETL developers and testers and conducted code reviews.
- Interacted closely with EES application support team to get resolves teh conflicts/technical gaps.
- Prepared and validated Test Scenarios and Test cases with EES Support team.
- Analyzed teh current state of all EES applications in AS-IS environment and plan for migrating/upgrading current environment to LINUX environment with minimal changes.
- Validated Development, QA and Production Environments as per functional requirements.
Environment: Informatica Power Center 7.1.3, SAP,Oracle 9i, SQL, Microsoft-Visio,Unix-HP, Windows XP, HP-Quality center, PVCS Serena, Documentum, Tivoli Job scheduling tool.
Confidential,Cincinnati,OH
Sr. Informatica Developer
Role and Responsibility:
- Analyzed Business/Functional specifications and translate them to technical specifications.
- Closely interact with Business users to complete data mappings specs between source and target.
- Established best practices for data movement, exceptional error handling, data recovery between source databases to target database.
- Identified numerous gaps between AS-IS and TO-BE systems and halped Users to resolve them.
- Designed, developed and implemented complex ETL routines using Informatica and Shell scripts.
- Participated in testing procedures, test strategy, test plans for System and User acceptance testing.
- Interacted with QA team to resolve teh issues timely manner and meet deliverables.
- Written Unix Shell scripts to automate batch load jobs in Maestro.
Environment: Informatica Power Center 7.1.1, SAP,Oracle 9i, Microsoft-Visio,Unix-HP, Windows XP, HP-Quality center, PVCS Serena, Documentum, Tivoli Job scheduling tool.
Confidential, CA
Datawarehouse Informatica Developer
Role and Responsibility:
- Automated manually generated Ad-hoc Reports using Informatica Mappings.
- Performed Administration tasks like Creating Repositories, Users, and Assigning privileges.
- Created technical design documents, as required, used to define ETL mappings.
- Designed, Developed and Tested Workflows/Worklets according to Business Process Flow.
- Converted existing Stored Procedures to ETL Mappings.
- Worked on Siebel Answers 7.7 to generate various ad-hoc reports.
Environment: Informatica 7.1.1, Siebel Analytics 7.7.2,SQL server 2000, SQL, DTS, Windows XP.
Confidential, Santa Clara, CA
Encover Informatica Developer
Role and Responsibility:
- Analyzed Functional specs and prepared Technical design docs.
- Developed interface Mappings and Mapplets as per business logic.
- Written ABAP programs in Source Qualifier to retrieve data from SAP system in real-time.
- Developed Unix Shell Scripts to automate teh ETL Batch process by using Meastro.
Environment: Informatica 6.1, PowerConnect, Oracle 9i, UNIX, Windows XP, SAP R/3, ABAP, Meastro.
Confidential, Akron, OH
Datawarehouse Informatica Developer
Role and Responsibility:
- Designed and developed ETL routines, Mapplets, reusable transformations and Mappings.
- Desigend, developed, implemented, tested and validated complex Mappings/Workflows.
- Extensively used ETL to load data from Oracle 9i, SAP R/3 and flat files to Teradata.
- Developed, enhanced and validated BTEQ scripts to load teh data into Teradatabase EDW.
- Identified teh bottlenecks. Tuned and optimized ETL processes for better performance.
Environment: menformatica 6.2, Teradata 6.0, BTEQ, Oracle 9i, VSAM, UNIX, Windows XP, Autosys 4.0.