- 9+years of experience in IT Industry and worked on Data Warehousing Projects, Data Integration Projects and Data migration projects with development/Enhancement Projects and have Production Support Experience.
- Experience in analysis, design, development and integration usingBigdata - Hadoop Technology like MapReduce, Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro and Data Processing.
- Strong experience with Informatica Data Quality IDQ 9.6, Informatica Analyst and Informatica Power Center ETL 9X, 10X ETL tools and good experience with Informatica Power-Exchange CDC Change Data Capture and B2B MFT tools
- Expertise in the Data Analysis, Design, Development, Implementation and Testing of Data Warehousing using Data Extraction, Data Transformation and Data Loading using DataStage.
- Extensively worked with various Relational and non-relational sources and target systems including Teradata, DB2, SQL Server, Oracle, PLSQL, Flat files, XML files, SalesForce.Com (SFDC) etc.
- Good experience with developing Data profiling, Scorecards, Mapplets, Rule, Mappings, workflows, Data Cleansing, Data Standardization process, data exceptional handling using Informatica Data Quality Developer IDQ and Informatica Analyst tools.
- Strong experience in loading and maintaining Data Warehouses and Data Marts using DataStage ETL processes.
- Expert in designing DataStage Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Dataset, Complex flat file, Aggregator, XML, ODBC Connector, Teradata Connector.
- Hands on experience in Big Data concepts, Hadoop, Hive, IBM DataStage Design, UNIX shell scripting, SAP Data Services, Business Object reporting, Python and Teradata.
- Extensively worked with cloud salesforce.com (SFDC) system as Source and target systems, SFDC source and Target transformation, Lookup transformations and using UPSERT logic with SFDC target.
- Used Teradata Utilities such as FLoad, MLoad, TPUMP, BTEQ scripts for loading the data from flat files to staging tables, Dimension tables and Fact tables.
- Good Experience with PLSQL Scripting and UNIX/LINUX shell scripting and expertise with using Pushdown Optimization (PDO) technique for performance optimization.
- Build the Logical and Physical data model for snowflake as per the changes required.
- Strong experience in Data Warehousing concepts such as D.W.H, DataMart's, Star Schemas, Snowflake schemas, Facts, Fact less facts, various Dimensions, SCD techniques, Dimension modeling (SCD) etc.
- Experience with developing slowly changing dimensions (SCD) type -1, 2, 3 mappings and various Change Data Capture (CDC) requirements for incremental data load process.
- Expertise in troubleshooting and analyzing issues/errors and providing resolutions accordingly.
- Experience with data modeling tools such as ERWIN and Microsoft VISIO and extensively worked with XML transformations like XML Generator and XML Parser to transform XML files.
- Participates in the development improvement and maintenance of Snowflake database applications.
- Experienced using Control-M, Autosys and Tivoli Workload Scheduler (TWS), Maestro scheduling.
- Strong knowledge of Extraction Transformation and Loading (ETL) processes using Ascential DataStage, UNIX shell scripting and SQL Loader.
- Expertise in all phases of Project/Software Development Life Cycle phases (SDLC) from Requirements gathering/Analysis/Development & Unit testing/QA Support to Production Deployment and knowledge transition to Support teams in various projects implemented.
- Good experience with Production Support activities, incidents, Service requests, Problem tickets, change tickets, DR activities, Maintenance activities with Service Now and Remedy
- Expertise in working with Business Analysts, Architects/Project members from different areas including technical and non-technical resources.
ETL Tools: IBM DataStage 11.x, 8.x, Informatica Power Center 10.x/9.x (Source Analyzer, Mapping Designer, Workflow Monitor, Workflow Manager, Power Connects for ERP and Mainframes, Power Plugs), Power Exchange, Informatica Data integrator, Power Connect, Data Junction (Map Designer, Process Designer, Meta Data Query).
OLAP/DSS Tools: Business Objects XI, Hyperion, Crystal Reports XI
Databases: Oracle 10g/11g/12c, Sybase, DB2, MS SQL Server … Teradata v2r6/v2r5, Netezza, HBase, MongoDB and Cassandra
Others: AWS Cloud, AWS Redshift, S3, TOAD, PL/SQLDeveloper, Tivoli, Cognos, Visual Basic, Perl, SQL-Navigator, Test Director, Win RunnerDatabase Skills Stored Procedures, Database Triggers and packages
Data Modeling Tools: Dimensional Data Modeling, using Star Join Schema Modeling, Snowflake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling
Languages: UNIX shell scripts, XML, SQL, Python, T-SQL, Scala.
Operating Systems: Windows NT/ AIX, LINUX, UNIX
Sr. Informatica Developer
Confidential - Matthews NC
- Involved in data modeling session, developed technical design documents and used the ETL DataStage Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database.
- Working on POC wrt.BigDatalike loading the data into HDFS and creating Map Reduce Jobs and created Hive external tables using shared meta-store and supported partitioning, clustering and dynamic partitioning for faster data retrieval.
- Created mappings using pushdown optimization to achieve good performance in loading data into Netezza and created the scheduling plan, job execution timings and sharing with scheduling team (Control-M)
- Worked with different API's to get the data using curl command and load into Redshift database and data ingestion with different data sources and load into AWS Redshift and load the files into AWS S3 Bucket and copy into Redshift for creating tables.
- Using UNIX Shell scripting, Python, IBM DataStage 9.1/ 11.3/11.5 or Hive sql for transformation and load data into Teradata or Oracle.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Worked on ODI Designer for designing the interfaces, defining the data stores, interfaces and packages, modify the ODI Knowledge Modules (Reverse Engineering, Journalizing, Loading, Check, Integration, Service) to create interfaces to cleanse, Load and transform the Data from Sources to Target databases, created mappings and configured multiple agents as per specific project requirements.
- Migrated the existing Teradata Scripts to Netezza from BTEQ to NZSQL by keeping the business logic same and validating the results across the systems.
- Implemented data intelligence solutions around Snowflake Data Warehouse.
- Developed Parallel Jobs using various Development / Debug Stages (Peek Stage, Head and Tail Stage, Row generator stage, Column generator stage, Sample stage) and Processing Stages (Aggregator, Change Capture, Change Apply, Filter, Sort and Merge, Funnel Remove Duplicate Stage)
- Defining the process flow using DataStage job sequences and scheduling the DataStage job sequencers using Tivoli Work Scheduler (TWS) and prepared the TWS job stream and job info files which are required to upload in TWS database.
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Extracting transformed data from Hadoop to destination systems, as a one-off job, batch process, or Hadoop streaming process and Used Pig asETLtool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Process streaming data from Kafka topic using Spark Streaming API and load them into data frames and data sets. Also, use Spark SQL to pull in the data and perform complex SQL logics for analytics
- Executing the jobs through TWS and monitoring the jobs and creation of VAR tables in TWS and Executing/Monitoring the jobs using the TWS.
- Established best practices for DataStage jobs to ensure optimal performance, reusability, and restart ability and designer and Director based on business requirements and business rules to load data from source to target tables.
- Performed data quality issue analysis using SnowSQL by building analytical warehouses on Snowflake.
- Wrote Python MapReduce scripts for processing the unstructured data and load log data into HDFS using Flume and worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Excellent with PL/SQL, T-SQL, Stored Procedures, Database Triggers and SQL * Loader and resolving the defects which have been raised by QA.
- Designed and developed ETL/ELT processes to handle data migration from multiple business units and sources including Oracle, Postgres, MSSQL, Access and others.
- Developing ETL pipelines in and out of data warehouse using Snowflakes SnowSQL Writing SQL queries against Snowflake.
- Reviewing the code developed by the subordinates with respect to naming standards, best practices and worked on different stages for creating the jobs based upon business application.
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose using Tableau.
- Extracted data from various source systems like Oracle, SQL Server and DB2 to load the data into Landing Zone and then by using Java copy command loaded into AWS-S3 Raw Bucket.
Environment: IBM WebSphereDataStage 11.3, Oracle 11g, TWS, DB2, Quality Stage Designer, Director, Administrator), DB2 UDB, Teradata V13, Hadoop, HDFS, Hive, Java, Flume, Kafka, Sqoop, Python, Snowflake, AWS S3, AWS Redshift, EC2, Linux, SQL, PL/SQL, Tableau, UNIX Shell Scripting, Datastage Version Control, MS SQL server, MongoDB, Netezza 4.x, Mainframe, Autosys, Info analyzer.
Sr. Informatica Developer
Confidential - Cleveland OH
- Responsible for Interacting with Business Analyst/Project Architect for understanding of requirements and analyze the requirements and responsible for preparing Design documents (High Level Design document and Low-level design document) and security and other compliance standard documents and also Prepared Unit test Plan which needs to be executed in Development environment after the code changes completed.
- Used to handle the group of people (offshore team) and perform lead role to make sure reaching deadlines and get the work done as per the time lines/phases.
- Used Informatica Power Center ETL tool for developing mappings/mapplets and create sessions/tasks, workflows for integrating the data to large Datawarehouse like Oracle, Teradata.
- Logical and Physical data modeling was done using Erwin for data warehouse database in Star Schema.
- Created the mappings using InformaticaBig Dataedition tool and all the job areas are subjected to run on Hive, Blaze or Spark engines.
- In order to increase the performance balanced the input files of slice count against large files and loaded into AWS-S3 Refine Bucket and by using copy command achieved the micro-batch load into the Amazon Redshift.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala and involved in migrating entire data warehouse data using AWS services and Apache SPARK and SQOOP applications.
- Responsible for designing, developing, and testing of the ETL (Extract, Transformation and Load) strategy to populate the data from various source systems (Flat files, Oracle, SQL SERVER) feeds using ODI.
- Building and configuring the three layers in the OBIEE repository using the Administration tool as and when required and consolidated data from different systems to load Constellation Planning Data Warehouse using ODI interfaces and procedures.
- Creating IDQ Data profiling, ScoreCards, Mapplets, Rules, Mappings, workflows, Data Cleansing, Data Standardization process using IDQ Developer and Informatica Analyst tools.
- Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
- Involved in Extraction, Transformation and Loading of data. Data ingestion with different data sources and load into redshift and working on redshift database AWS maintenance, we must vacuum and analyze our AWS redshift tables.
- Created all JDBC, ODBC and HIVE connections to the InformaticaDeveloperClient Tool to import the parquet files and the relation tables.
- Developed mappings for Change data Capture CDC with Power Exchange with Change data Capture and developed Mappings, Workflows for extracting CDC data and scheduled in TWS Scheduler tool.
- Scheduled different Snowflake jobs using NiFi.
- Developed mappings in multiple schema data bases to load the incremental data load into dimensions.
- By using Informatica BDM refined the data with the scope of delta processing and for the cleansing the data from S3 Raw to S3 Refine and injected data from S3-Raw after refining loaded into S3-Refine Bucket in the form of Slices and by using copy Command loaded into Redshift Database.
- Worked on Streaming near Real Time (NRT) data using Kafka - Flume integration and integrated Apache Storm with Kafka to perform web analytics and to perform clickstream data from Kafka to HDFS.
- Developed jobs to send and read data from AWS S3 buckets using components like tS3Connection, tS3BucketExist, tS3Get, tS3Put.
- Ingesting wide variety of data lake structured, unstructured and semi structured into Big data eco systems with batch processing, real time streaming and SQL
- Used Informatica B2B Data Exchange tool MFT the files from external venders to internal servers and used Pushdown Optimization (PDO) technique for performance optimization.
- Involved in Installing and Configuring of Informatica MDM Hub Console, Hub Store, Cleanse and Match Server, Address Doctor, Informatica Power Center applications and worked on InformaticaBigDatatool moving data from Oracle, Flat File, and JSON file to hive.
- Used Teradata Utilities such as FLoad, MLoad, BTEQ scripts for loading the data from flat files to staging tables, Dimension tables and Fact tables. These tables will be used for Cognos reporting.
- Worked on Hadoop File formats Text Input Format and Key Value Text Input Format and designing data model on Hbase and Hive. Creating MapReduce jobs for Adhoc data requests.
- Developed mappings& writing PLSQL Scripts for loading the data into Dimensional tables (SCD 1, 2, 3, 4) and fact tables and writing shell Scripts to automate Informatica workflows in scheduling tool.
- Involved in Migrating Objects from Teradata to Snowflake.
- Extracted the data from the source systems Oracle, PeopleSoft, DB2, and Mainframe flat files transformed the data and loaded into the OBIEE analytics warehouse using Informatica.
- Working closely with the client on planning and brainstorming to migrate the current RDBMS to Hadoop and migration of Informatica jobs to HadoopSqoop jobs and load into oracle database
Environment: Informatica Data Quality IDQ 9X/10.2, Informatica Analyst, Informatica Power-Exchange CDC (change data capture), Informatica PowerCenter 10X/9X, ODI, B2B MFT Tool, Oracle, PL/SQL, Snowflake, Teradata, MongoDB, DB2, SalesForceDotCom (SFDC), Hadoop, Tableau, Hive, Java, Kafka, HBase, Python, Parquet Files, SQL Server 2016, MongoDB, SQL, Business Objects, Shell Scripting, UNIX, Impala, AWS S3, AWS Glue, EC2 and Redshift.
Sr. ETL Informatica Developer
Confidential -Grand Rapids, MI
- Analyzed the Business Requirement Documents (BRD) and laid out the steps for the data extraction, business logic implementation & loading into targets.
- Responsible for Impact Analysis, upstream/downstream impacts and created detailed technical specifications for Data Warehouse and ETL processes.
- Used Informatica as ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into the Teradata using Teradata Utilities and applied the concept of Change Data Capture (CDC) and imported the source from Legacy systems.
- Worked on Informatica- Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer and used most of the transformations such as the Source Qualifier, Expression, Aggregator, Filter, Connected and Unconnected Lookups, Joiner, update strategy and stored procedure.
- Extensively used Pre-SQL and Post-SQL scripts for loading the data into the targets according to the requirement.
- Leveraged Informatica Big Data Manager (BDM) tool forETLprocessing on Hive tables and HDFS files in Hadoop Big Data environment. Executed mappings in both Native and Hadoop modes.
- Upgrading OBIEE 10g to 11g and migrating rpd & catalog from 10g to 11g using utility bat file and extracted data from various heterogeneous sources like Oracle, Sybase, SFDC, Flat Files and COBOL (VSAM) using Informatica PowerCenter and loaded data in target database DB2.
- Extracted Data from Hadoop and Modified Data according to Business requirement and load into Hadoop.
- Build frame work for inbound/Outbound files getting in to Facets System and used the application to setup the group and subscriber data in theFACETS application.
- Developed mappings to load Fact and Dimension tables, SCD Type 1&Type 2 dimensions and Incremental loading and unit tested the mappings.
- Successfully upgraded Informatica 9.1 and to 9.5 and responsible for validating objects in new version of Informatica and involved in Initial loads, Incremental loads and Daily loads to ensure that the data is loaded in the tables in a timely and appropriate manner.
- Extensively worked in the performance tuning of Teradata SQL, ETL and other processes to optimize session performance and loaded data in to the Teradata tables using Teradata Utilities Bteq, Fast Load, Multi Load, and Fast Export, TPT.
- Worked extensively with different Caches such as Index cache, Data cache and Lookup cache (Static, Dynamic and Persistence) while developing the Mappings and created Reusable transformations, Mapplets, Worklets using Transformation Developer, Mapplet Designer and Worklet Designer.
- Responsible for creating complex mappings according to business requirements, which can are scheduled through ODI Scheduler.
- Involved in creation and maintenance of BMC Control-M jobs that submit Cognos cube build scripts that perform automatic publishing of Power Play operational reporting cubes providing version control.
- Designed Schemas using Fact, Dimensions, Physical, Logical, Alias and Extension tables in OBIEE Administrator tool.
- Implement partitioning and bucketing techniques in Hive and developed script to create external Hive tables and worked with different complex file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Responsible for Unit Testing, Integration Testing and helped with User Acceptance Testing and scheduling Informatica jobs and implementing dependencies if necessary, using Autosys.
- Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
- Worked extensively on Informatica Partitioning when dealing with huge volumes of data and also partitioned the tables in Teradata for optimal performance.
Environment: Informatica Power Center 9.5.1, Oracle 11g, DB2, Teradata, Tableau, Flat Files, Erwin 4.1.2, Sql Assistant, Cognos, Facets, Hadoop, Hive, SQL, Netezza, HDFS, Sqoop, OBIEE, Shell Scripting, UNIX, Toad, Winscp, Putty, Autosys, UNIX, Agile.
Confidential - New York, NY
- Involved in full life cycle development including Design, ETL strategy, troubleshooting Reporting, and Identifying facts and dimensions.
- Understanding the business requirements and designing the ETL flow in DataStage as per the mapping sheet, Unit Testing and Review activities.
- Handling Change requests - Understanding the application workflow in the existing jobs on DataStage and applying new changes to them, testing & Review activities.
- Adhering to the Process and creating and posting all the required documents and deliverables like the UTC document, Review Checklist, etc.
- Defining the process flow using DataStage job sequences and scheduling the DataStage job sequencers using Tivoli Work Scheduler (TWS)
- Worked with business analyst to identify, develop business requirements, transform it into technical requirements and responsible for deliverables.
- Provide the staging solutions for Data Validation and Cleansing with DataStage ETL jobs.
- Used the DataStage Designer to develop processes for extracting, transforming, integrating, and loading data into Enterprise Data Warehouse.
- Used Parallel Extender for Parallel Processing for improving performance when extracting the data from the sources.
- Extensively worked with Job sequences using Job Activity, Email Notification, Sequencer, Wait for File activities to control and execute the DataStage Parallel jobs.
- Created re-usable components using Parallel Shared containers and used various Parallel Extender partitioning and collecting methods.
- Defined Stage variables for data validations and data filtering process and tuned DataStage jobs for better performance by creating DataStage Hashed files for staging the data and lookups.
- Used DataStage Director for running the Jobs.
- Extensively written shell scripts in different scenarios and written DataStage routines to achieve business logic.
- Designed and implemented slowly changing dimensions and methodologies and implemented Debugging Methodologies with Break Point Options.
- Transfer data from various systems through FTP Protocols and developed UNIX shell scripts and updated the log for the backups and involved in Unit Testing with the jobs and date loaded in the Target database.
- Written dataStage routines for data validations and written batch Job Controls for automation of Execution of DataStage Jobs.
Environment: DataStage 8.x, Oracle 10g, DB2 UDB 9.0, Teradata, TWS, AIX 5.1, UNIX, XML, Mainframe system.
Confidential, New York, NY
- Reviewing the requirements with business, doing regular follow ups and obtaining sign offs.
- Worked on different tasks in Workflows like sessions, events raise, event wait, decision, e-mail, command, worklets, Assignment, Timer and scheduling of the workflow.
- Created sessions, configured workflows to extract data from various sources, transformed data, and loading into data warehouse.
- Moving the data from source systems to different schemas based on the dimensions and fact tables by using the slowly changing dimensions (SCD) type 2 and type 1.
- Used various transformations like Filter, Expression, Sequence Generator, Source Qualifier, Lookup, Router, Rank, Update Strategy, Joiner, Stored Procedure and Union to develop robust mappings in the Informatica Designer.
- Performed analysis of Source, Requirements, existing OLTP system and identification of required dimensions and facts from the Database.
- Tuning Informatica Mappings and Sessions for optimum performance and developed various mapping by using reusable transformations.
- Prepared the required application design documents based on functionality required.
- Designed the ETL processes using Informatica to load data from Oracle, FlatFiles (Fixed Width and Delimited) to staging database and from staging to the target Warehouse database.
- Worked on database connections, SQL joins, cardinalities, loops, aliases, views, aggregate conditions, parsing of objects and hierarchies.
- Responsible for monitoring all the sessions that are running, scheduled, completed and failed. If the session fails debug the Mapping.
- Involved in testing Unit and integration Testing of Informatica Sessions, Batches, fixing invalid Mappings
- Defined the program specifications for the data migration programs, as well as the necessary test plans used to ensure the successful execution of the data loading processes.
- Worked on Dimensional Data Modeling using Data modeling tool Erwin and populated DataMarts and did System Testing of the Application.
- Built the Informatica workflows to load table as part of data load and wrote Queries, Procedures and functions that are used as part of different application modules.
- Implemented the best practices for the creation of mappings, sessions, workflows and performance optimization.
- Created Informatica Technical and mapping specification documents according to Business standards.
Environment: InformaticaPowerCenter 8.6.1/8.1.1 , Cognos 9, SQL Server 2008, IDQ 8.6.1, Oracle 11g, PL/SQL, TOAD, Putty, Autosys Scheduler, UNIX, Teradata 13, Erwin 7.5, ESP, WinScp