- Data Warehousing and ETL Specialist with approximately 8 years of Industry experience with in depth knowledge and Experience with ETL tools viz. Informatica Power Center (version 9.5,8.6), databases such as Oracle, Green plum, Teradata etc. I have worked extensively in implementing large - scale Enterprise Data Warehouse and Risk Solutions projects and designing the ETL flow to handle daily flow of huge amount of data.
- Experience in building and Performance Tuning of Informatica Mappings, SQL queries to handle huge amount of data (records in billions daily) that gets loaded in each batch cycle daily into the warehouse.
- Well aware of Data warehousing methodologies, Data Modelling, Agile framework, data extraction, translation and loading fundamentals.
- Extensive experience in creating complex mappings using different transformations like Source Qualifier, Expression, Lookup, Aggregator, Update Strategy, Joiner, Sorter, Filter, Router, sequence generator, Normalizer, Rank, XML transformations to pipeline data to Data Warehouse/Data Marts.
- Experience in Performance Tuning of sources, targets, mappings, transformations and sessions in Informatica, by implementing various partitioning techniques, identifying performance bottlenecks and implementing solutions.
- Experience in Building Complex Sql queries in Oracle/Greenplum/Teradata and performance analysis of the queries.
- Extensive Experience in leading a team, gathering client requirements and deliver the end product to Business Users by following SDLC.
- In order to handle the huge volume of data and for better Performance and Data delivery, I have worked in migrating Legacy processes from traditional ETL and RDMS systems into HDFS using Hadoop Eco System primarily Sqoop, Impala and Spark (Python) for distributed processing.
- Strong experience in creating Stored Procedures, Functions, Views, Triggers, Indexes and Partitions in Oracle and Greenplum.
- Skilled in Data warehousing, Scripting with over 7 years’ experience in dealing with various Data sources like Oracle, Greenplum, Teradata, SQL Server, Mainframe Files, Flat Files etc.
- Experience in building queries and generating reports using Tableau.
- Created UNIX shell scripts to handle/parse data sets, automate processes, automate and streamline the process to run the Informatica workflows, Controlling the ETL flow etc.
- Implemented Slowly Changing Dimension (SCD Type 1/2/3) methodology for accessing the full history of accounts and dimension tables and Change Data Captures (CDC).
- Very good exposure and experience in Providing L3 Production Support to Business Critical Informatica processes/ETL flows for identifying issues with performance, data Issues, code Issues, debugging of unusual errors etc. and resolving them with active tracking to ensure minimal/no impact to business.
- Experience in migration of workflows from one environment to another, Connection creation Eg. Odbc, Relational, Mload, Fast Export, Fast Load etc. and everything related to a new application set up in the environment. Also experienced in Informatica Admin related activities, Informatica Admin console, Manage and maintain domain/repositories, Backups of repositories, manage users, groups, associated privileges and folders.
- Good exposure to Development, Testing, Debugging, Implementation, Documentation, End-user training and Production support.
Operating Systems: Windows 7/NT/XP/2003, Red Hat Linux.
Languages: SQL, PL/SQL, UNIX Shell Scripting
Tools: ETL Informatica Power Center 9.5.1,8x, GIT, SVN, HP Service Manager JIRA, Synergy Version Management Tool.
Reporting Tools: Tableau, Business Objects
RDBMS: Oracle, Greenplum, Teradata, Sql Server, DB2.
Scheduling Tools: Control-m, TWS (Tivoli Workload Scheduler), Informatica Scheduler.
Others: Hadoop Eco System, HDFS, Spark, Impala, Hue, Sqoop.
Informatica and Data Analyst-Team Lead
Confidential, Jersey City, NJ
Asset Management Investment Risk Project works on extracting data from various data sources primarily Oracle and Flat Files, transform as per business rules and load into Investment Risk data marts in Oracle and Green plum. It involves various ETL Informatica Mappings, PL/SQL Scripts, Unix scripts for transforming and formatting the data and finally loading into the data mart. The data is then modelled and sent for Risk Analysis. The reported Risk Analysis results were then loaded again back into the DB using ETL logic for business users to use via reporting tools such as Tableau. For better performance and data delivery, Hadoop eco system primarily Sqoop, Impala is leveraged recently in multiple processes. Tableau is then used as the reporting tool for formatting and presenting this data to end users. This facilitates the users in making business decisions based on various Risk factors.
Informatica Data Warehouse Developer
Confidential, New York, NY
RESPONSIBLITIES AND TASKS HANDLED:
- Requirement gathering from Client, understanding the flow from Business and UI Team and building the Process data flow. Distribute responsibilities among the team members, build and review code and tracking the project to completion through SDLC.
- Develop and maintain ETL (Extract, Transformation and Load) mappings to extract the data from multiple source systems like Oracle, Greenplum, Teradata, SQL server and Flat files and loaded into Greenplum/Oracle/DB2/Sql server data marts.
- Worked extensively in understanding and analyzing the business logic behind any long running or high resource consuming process and gather required details to fine tune the same, to enhance value for earlier data delivery and better system stability.
- Debug and analyze the various issues reported by business users at L2/L3 level and figuring out the cause of the issue and the required fix in code/data etc. accordingly.
- Worked in migrating Legacy processes from RDMS systems into HDFS using Hadoop Eco System primarily Sqoop and Impala. Experienced in HDFS, Hue, Spark etc.
- Extensively work on developing Informatica mappings, sessions and workflows and debugging mappings for errors/bug fix/data validation etc. analyzing session logs and understanding them.
- Monitor the Batch and the data flows to ensure availability of required data/reports to business.
- Create complex mappings using different transformations like Source Qualifier, Expression, Lookup, Aggregator, Update Strategy, Joiner, Sorter, Filter, Router, sequence generator, Normalizer, Rank, XML transformations to pipeline data to Data Warehouse/Data Marts.
- Extensively worked on various Look Up Caches techniques - Connected and Unconnected Lookups, cached (Names and Un Named) and Un cached Lookups, Persistent look ups etc.
- Perform SQL and Data Analysis, Write advanced and complex SQL queries in Oracle/Greenplum/Teradata/Sql server.
- Create reusable UNIX shell scripts to download files from Mainframes or other sources, parse data as per requirement, read control files, Run the respective Informatica workflows, post load script if any and controlling the entire ETL flow.
- Performance tuning of the Informatica mappings using various components like Parameter files, Variables, Cache etc.
- Build and generate SQL Commands to fetch data from Greenplum/Oracle/Teradata as per business/user requirement and use the same in Informatica mappings for further processing and data delivery.
- Work on fine tuning the performance of already existing queries which are consuming high CPU, I/O skew, high spool space etc.
- Query Informatica Metadata repository to fetch Metadata related information and build a utility which contain all information regarding Informatica session execution statistics(current/history) /Component whereabouts etc. across all applications in the data warehouse for all the Application teams for easy reference and analysis.
Technologies Summary: Informatica Power Center 9.6/9.0.1/8.6, Oracle, Greenplum, Spark, Scala, Impala, Hue, Tableau, Teradata (Teradata Utilities - BTEQ, MLOAD, Fast Load and Fast Export), PL/SQL (Sql Queries, Functions, Stored Procedure, Trigger, Packages), UNIX Scripting, Python, Java Basics.