Etl-big Data Developer Resume
Rochester, MN
PROFESSIONAL SUMMARY:
- Over 8 years of IT exposure in Analysis, Design, Development, Implementation and Support of Relational Databases(OLTP), Data Warehousing Systems(OLAP) & Data Marts in Banking, Retail, Health care and Pharmaceutical domains.
- Inclusive of 5+ years hands - on experience of Data Warehousing in Extraction-Transformation-Loading (ETL) using Informatica Power Center 9.X/8.X/7.X, HBase, Pig, Hive, Hadoop, HDFS, Big Data.
- Experienced in working with Big Data and Hadoop File System (HDFS).
- Strong knowledge & usage of the concepts like dimensional modeling, Star and Snowflakes Schema, Ralph-Kimbal and Bill-Inmon methodologies.
- Hands on experience in working with Echosystems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie.
- Domain knowledge on Informatica Big data Edition (BDE) guidelines and best practices.
- Strong knowledge of Hadoop and Hive's analytical functions.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Efficient in building hive, pig and map-reduce scripts.
- Loaded the dataset into Hive for ETL (Extract, Transfer & Load) operations.
- Extensively worked on PowerCenter Transformations such as Connected & Unconnected Lookups, Filter, Expression, Router, Normalizer, Joiner, Update Strategy, Sorter, Source Qualifier etc.
- Extensively worked with large Databases in Production environments.
- Proficient in using Informatica workflow manager & Workflow monitor to create, schedule and control workflows, tasks, and sessions in UIX and Windows environment.
- Proven experience in Data Integration, Data Migration, Code Migration, Version Control and Conversion projects.
- Managing end-to-end delivery concerning data warehouse in business intelligence projects.
- Extensively used SQL & PL/SQL to write Stored Procedures, Functions, Packages and Triggers.
- Data modeling experience in RDBMS Concepts, Normalization, surrogate keys, along with Change Data Capture(CDC).
- Extensive experience in integration of various heterogeneous data sources definitions like SQL Server, Oracle, Teradata, Flat Files, Excel and XML files loaded data in to Data ware house and Data marts using Power center.
- Well versed in developing various database objects like tables, indexes, constraints & views in Oracle 11g/10g.
- Worked on Performance tuning, Indexing and Session Partitioning techniques on Sources, Targets & Mappings along with the ability to determine the performance bottlenecks.
- Experience in installation, configuration; along with the designing and running IDQ (Data Quality) scripts.
- Expertise in creating IDQ components: such as Data Objects, Quick Profiles, Dashboards & Scorecards.
- Exposure to SSIS infrastructure, by creating SSIS Packages to migrate the slowly changing dimensions.
- Extensively involved in project coordination activities by keeping a track of various project milestones and organizing weekly project reviews throughout the entire SDLC, along with extensive usage of Waterfall & Agile methodologies.
- Profound knowledge in coding the BTEQ Scripts, MLoad, Fast Load, TPUMP to perform data load in Teradata .
- Proven experience translating business problems into actionabledataquality initiatives.
- Team Player with excellent communication, analytical, writing, interpersonal and presentation skills.
TECHNICAL SKILLS:
Databases /RDBMS: Oracle 8.x/9.x/10g/11g, SQL Server 2005/2008/2012, Teradata, My SQL 5.0/4.1, MS-Access. Editors (SQL Navigator, Toad), Teradata
Database Tools: Oracle Enterprise Manager, Quest TOAD, SQL*PLUS, SQL*Loader SQL Navigator Export/Import
ETL Tools: Informatica Power Center 9.5/9.1/8.6.0, Informatica Power Exchange 9.1, Informatica Power Analyzer, Informatica Data Quality (IDQ), IDE
Data Modeling: ERWIN 4.0, Star and Snowflake Schema Modeling, Dimensional Data modeling, Fact and Dimensional Tables, Entities & Attributes.
Big Data/ Hadoop: HDFS, MapReduce,Pig, Hive, HBase, Oozie, Sqoop, Python
Package: MS Office (MS Access, MS Excel, MS PowerPoint, MS Word, MS Project), Visual Studio 6.0.
Environment: Windows 2000/XP/7, Unix, Windows SERVER 2003
Programming Skills: SQL, PL/SQL, C, C++, Unix, XML
WORK EXPERIENCE:
Confidential, Rochester, MN
ETL-Big Data Developer
Responsibilities:
- Designed high level ETL architecture for overall data transfer from the OLTP to OLAP.
- Designed and maintained complex ETL mappings, mapplets, and workflows.
- Development of queries, stored procedures, functions, triggers, views and materialized views.
- Extensively worked with Big Data and Hadoop File System (HDFS).
- Involved in installing, configuring, and using Hadoop ecosystem components
- Responsible for Importing and exporting data from existing Database's into HDFS and Hive using Sqoop.
- Efficient in building hive, pig and map-reduce scripts.
- Experienced in running Hadoop Map-reduce jobs to process terabytes of xml format data.
- Developed multiple Map Reduce jobs in java for data cleaning and also written Hive & Pig UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Worked extensively with Sqoop for importing data from DB and Hive queries and PIG scripting.
- Automated all the jobs using Oozie workflows and supported in running jobs on the cluster.
- Created various Documents such as Source-To-Target Data mapping Document, Unit Test Cases and Data Migration Document.
- Created mappings using the transformations like Source Qualifier, Aggregator, Expression, Lookup, Router, Normalizer, Filter, Update Strategy and Joiner transformations.
- Created reusable transformations and Mapplets and used them in complex mappings.
- Used debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
- Designed and developed UNIX shell scripts to schedule jobs.
- Also wrote pre-session and post-session shell scripts.
- Wrote Korn Shell Scripts to execute Power center jobs using Control M Scheduling tool.
- Unit tested and tuned SQLs and ETL Code for better performance.
- Worked with flat files, Oracle, Teradata and DB2 as sources.
- Worked with various system interfaces to gather requirements for migration and implementation.
- Coding using BTEQ SQL of TERADATA, wrote UNIX scripts to validate, format and execute the SQL's on UNIX environment.
- Created Stored Procedures to transform the data. Worked extensively on SQL, PL/SQL for various needs of the transformations.
- Actively involved in building the system test environment and migrated mappings from Development to System Test environment and executed code in QA environment.
- Created packages in Harvest to migrate code across multiple environments through a standard transmittal process.
- Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
- Monitored the performance and identified performance bottlenecks in ETL code.
Environment: Informatica Power Center 9.5, Oracle 11g, DB2, Erwin 4.0, Unix Shell Scripting, Hadoop 2.0, HDFS, Sqoop, Control M, MS PowerPoint, Teradata, TOAD, SQL, PL/SQL, Win NT 4.0.
Confidential, Phoenix, AZ
Informatica Developer
Responsibilities:
- Performed major role in understanding the business requirements and designing and loading the data into data warehouse (ETL).
- Created new mappings and updating old mappings according to changes in Business logic.
- Performed extraction, transformation and loading of data from RDBMS tables and Flat File sources into Oracle RDBMS in accordance with requirements and specifications.
- Extensively used various transformations Lookup, Update Strategy, Expression, Aggregator, Filter, Stored Procedures and Joiner.
- Performed Unit Testing and tuned the mappings for better performance.
- Created reusable transformations and mapplets.
- Writing PL/SQL procedures days of supply for processing business logic in the Database.
- Worked on SQL tools like TOAD to run SQL queries to validate the data.
- Worked on database connections, SQL Joins in Database level.
- Extensively used SQL to load Data from flat files to Database tables In Oracle.
- Used Workflow manager for session management, database connection management and scheduling of jobs to be run in the batch process.
- Using Unix Shell Scripting for scheduling Informatica workflows.
- Worked with IDQ Informatica Data Quality (IDQ) 9.1
- Knowledge of Informatica Data Quality standards, guidelines and best practices.
- Experience in defining and deploying data quality programs and tools on enterprise data quality projects
- Experience in data profiling & data quality rules development using Informatica Data Quality tools.
- Experience in Data Cleansing techniques.
- Familiarity with current trends in Data Quality technologies.
- Used calculations, variables, sorting, drill down, slice and dice for creating Stock Status Report.
Environment: Informatica Power Center 9.1, Oracle 10g, Windows NT, Flat files, TOAD, SQL, PL/SQL, SQL Workbench, Putty, Unix Shell Scripting .
Confidential, Charlotte NC
ETL Developer
Responsibilities:
- Extracted data from various sources, applied business logic to load them in to the Data Warehouse.
- Involved in design, development and implementation of ETL process in power center.
- Extracted data from Oracle, SQL Server, Flat files and DB2 source systems.
- Used Informatica Designer to create complex mappings using different transformations to move data from the source to the Data Warehouse.
- Developed the transformation/business logic to load data into data warehouse.
- Used Source Analyzer and Warehouse designer to import the source and target database schemas, and the Mapping Designer to map the sources to the target.
- Used Transformation Developer to create the filters, lookups, and stored procedure, Joiner, update strategy, expressions and aggregations transformations.
- Developed reusable mappings using Mapplets, parameters and Variables.
- Developed Fast export scripts to extract source Flat files.
- The Teradata EXPLAIN facility, which describes to end-users how the database system will perform any request.
- Populate or refresh Teradata tables using Fast load, Multi load & Fast export utilities for user Acceptance testing and loading history data into Teradata.
- Involved in Dimensional modeling (star schema) of the Data warehouse and using Erwin to design the business process, dimensions and measured facts.
- Extensively worked on Connected & Unconnected Lookups, Router, Expressions, Joiner, Source Qualifier, Aggregator, Filter, Sorter, Update Strategy transformations.
- Designed Complex mappings for Slowly Changing Dimensions using Lookup (connected and unconnected), Update strategy and filter transformations for retaining consistent historical data.
Environment: Informatica Power Center 8.6.0, Oracle 10g, Putty, TOAD 10.1 for Oracle, MS SQL Server, Flat Files, PL/SQL, ERWIN 4.0 Data Modeling tool, Windows 2000, UNIX SHELL scripting, CONTROL-M.
Confidential, Bentonville, AR
Informatica Developer
Responsibilities:
- Used Informatica Power Center 8.6.1 for migrating data from various OLTP databases and other applications to the Radar Store Data Mart.
- Created complex Informatica mappings with extensive use of Aggregator, Union, Filter, Router, Normalizer, Joiner and Sequence generator transformations.
- Created and used parameter files to perform different load processes using the same logic.
- Extensively used PL/SQL for creation of stored procedures and worked with XML Targets.
- Was involved in the modification of the XSD’s so as to write the data to the appropriate XML files.
- Performed performance tuning of source level, target level, mappings and session.
- Defined Target Load Order Plan and Constraint based loading for loading data appropriately into multiple Target Tables.
- Used different Tasks (Session, Assignment, Command, Decision, Email, Event-Raise, Event-Wait and Control) in the creation of workflows.
- Involved in modifying already existing UNIX scripts and used them to automate the scheduling process.
- Extracted data from DB2 Mainframe using Power Exchange to XML files
- Using Power Exchange to read the Mainframe files in the mapping did look ups on DB2 tables and loaded/updated data in target DB2 tables
- Filtered Changed data using Power exchange CDC and loaded to the target
Environment: Informatica Power center 8.6.1, Informatica Exchange, Oracle 9i, XML, UNIX, PL/SQL, Windows 2000/XP, Autosys.
Confidential
ETL Developer
Responsibilities:
- Conducted user meeting and gathered the requirements
- Wrote technical requirements and specifications for the modifications to assess the feasibility of the project after interacting with customers/end users to obtain the requirements.
- Worked with Business Analysts and Data Architects in gathering the requirements and designed the Source to Target Mapping Specification Documents.
- Prepared technical requirements documents which include both macro-level and micro-level design documents.
- Used Erwin data modeler to design the datamarts and also generate the necessary DDL scripts for objects creation for DBA review.
- Used technical transformation document to design and build the extraction, transformation, and loading (ETL) modules.
- Involved in preparing & documenting Test cases and Test procedures. Involved in developing these for Unit test, Integration Test and System Tests.
- Designed and Developed Informatica Mappings which will capture and load the incremental load from the source tables into the target tables.
- Implemented Slowly Changing Dimensions - Type I, II mappings as per the requirements.
- Performed Data transformations using various Informatica Transformations like Union, Joiner, Expression, Lookup, Aggregate, Filter, Router Normalizer, Update Strategy, etc.
- Involved in the performance tuning of Informatica mappings and Reusable transformations and analyze the target based commit interval for optimum session performance.
- Wrote Pre-session and Post-session shell scripts for dropping, creating indexes for tables, Email tasks and various other applications
- Used unconnected lookup where different expressions used the same lookup and had multiple targets, which use same logic executed and return one value.
- Used Sequence Generator to create Dimension Keys and Update Strategy to insert records into the target table in staging and Data Mart.
- Used the Debugger in debugging some critical mappings to check the data flow from instance to instance.
- Created error log table to capture the error messages, session load time.
- Optimized Query performance and DTM buffer size, Buffer Block Size to tune session performance.
- Performance tuned the workflow by identifying the bottlenecks in sources, targets, mappings and sessions.
- Identifying read/write errors using Workflow and Session logs.
- Used Parameter files to initialize workflow variables, Mapping parameters and mapping variables and used system variables in mappings for filtering records in mappings.
- Developed all the mappings according to the design document and mapping specs provided and performed unit testing.
Environment: Informatica Power Center 8.1/8.6.0 Tools (Designer, Workflow Manager, Workflow Monitor), Informatica Metadata Manager, Oracle 9i/10g, Sql server 2005, Toad 9.5, Unix, Erwin 7.x.
Confidential
Jr. ETL Developer
Responsibilities:
- Involved in full life cycle development including Design, ETL strategy, troubleshooting Reporting, and Identifying facts and dimensions.
- Developed Informatica mappings, reusable transformations. Developed and wrote procedures for getting the data from the Source systems to the Staging and to Data Warehouse system.
- Extensively used transformations to implement the business logic such as Sequence Generator, Normalizer, Expression, Filter, Router, Rank, Aggregator, LOOK UP (Target as well as Source), Update Strategy, Source Qualifier and Joiner, Designed complex mappings involving target load order and constraint based loading
- Create/build and run/schedule workflows and worklets using the Workflow Manager.
- Optimizing/Tuning mappings for better performance and efficiency, Creating and Running Batches and Sessions using the Workflow Manager, Extensively used UNIX Shell scripts for conditional execution of the workflows. Optimized the performance of Mappings, Workflows and Sessions by identifying and eliminating bottlenecks
- Performed Unit Testing at development level, Source code migration and documentation
- Performance Tuning of the Informatica Mappings by adopting Explain plans, cutting down query costs using Oracle hints, changing the mapping designs.
- Managed the Metadata associated with the ETL processes used to populate the data warehouse.
- Responsible to tune ETL procedures and STAR schemas to optimize load and query Performance.
- Extensively worked in the performance tuning of the programs, ETL Procedures and processes. Coded database triggers, functions and stored procedures and written many SQL Queries. Helped coding shell scripts for various administration activities for daily backup.
- Managed users & roles for database security, Maintained system security, control and monitor user access to database.
- Assigned predefined profiles and roles to the users to maintain database security, CPU activity, idle time and quota management on table-spaces.
Environment: Informatica Power Center 7.1, Oracle 8i, PL/SQL, Erwin and Toad.