- Over 9+ years of experience in Information Technology including Data Warehouse/DataMart development using ETL/Informatica Power Center and SQL Server Integration (SSIS) across various industries such as Healthcare, Banking, Insurance, Pharmaceutical, Finance.
- Excellent knowledge and work experience on Amazon Cloud, S3, Amazon Glue, RDS, Dynamo DB, Snowflake, RedShift.
- Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data and experienced in Big data technologies - Pig, Hive, Sqoop, Flume, Oozie, NoSQL, databases (Cassandra & Hbase).
- Experienced working with dimensional Data Modeling using ERwin, Physical & Logical data modeling, Ralph Kimball Approach, Star Modeling, Data marts, OLAP, FACT & Dimensions tables.
- Excellent experience with databases such as Oracle, Teradata, Netezza in developing SQL, PL/SQL packages as per business needs and e xperienced in code review of ETL applications, SQL queries, UNIX shell scripts and TWS commands built by fellow colleagues.
- Excellent experience in working with indexes, complex queries, stored procedures, views, triggers, user defined functions, complex joins, loops T-SQL, DTS/SSIS using MS SQL Server and e xperienced working with configuring SSIS packages using package logging, breakpoints, Checkpoints and Event handler to fix the errors.
- Experience with Oozie Scheduler in setting up complex workflow jobs with Spark, Hive, Shell and Pig Actions.
- Experienced in OLTP/OLAP System Study, Analysis and E-R modeling, developing database Schemas like Star schema and Snowflake schema used in relational, dimensional and multidimensional data modeling and strong working knowledge in RDBMS, ER Diagrams, Normalization and De Normalization Concepts.
- Very good experience and understanding of Data warehousing, Data modeling and Business Intelligence concepts with emphasis on ETL and life cycle development using Informatica PowerCenter (Repository Manager, Designer, Workflow Manager, Metadata Manager and Workflow Monitor), Informatica Data integrator.
- Excellent knowledge on data warehouse concepts using Ralph Kimball and Bill Inmon methodologies.
- Good knowledge of Hadoop (MapR) architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Excellent experience using Teradata SQL Assistant, data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export using Mainframes and UNIX
- Expertise in Extraction, Transforming and Loading (ETL) data using SSIS creating mappings/workflows to extract data from SQL Server, Excel file, other databases and Flat File sources and load into various Business Entities (Data Warehouse, Data mart)
- Experienced in coding using SQL, PL/SQL procedures/functions, triggers and exceptions and excellent exposer in Relational Database concepts, Entity relation diagrams.
- Experience in creating various reports like drill down, sub reports, parameterized, multi-valued and various Ad hoc reports through Report model creation using SSRS and proficiency in Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cube, Data Mining Models, and Deploying and Processing SSAS objects.
- Experienced in UNIX working environment, writing UNIX shell scripts for Informatica pre & post session operations.
- Very good Knowledge and experience of complete SDLC including Requirement Analysis, Requirement Gathering, Project Management, Design, Development, Implementation and Testing.
- Excellent experience in designing and developing complex mappings by using transformations like Unconnected and Connected Lookups, Source Qualifier, Expression, Router, Filter, Aggregator, Joiner, Update Strategy, Union, Sequence Generator, Rank, Sorter, Normalizer, Stored Procedure, Transaction Control, External Procedure etc. And e xtensive experience with Informatica Power Center 10.x, 9.x, and 8.x hosted on UNIX, Linux and Windows platforms.
ETL Tools: Informatica Power Center 10.x/ 9.x (Source Analyzer, Mapping Designer, Workflow Monitor, Workflow Manager, Power Connects for ERP and Mainframes, Power Plugs), Power Exchange, Informatica Data integrator, Power Connect, Data Junction (Map Designer, Process Designer, Meta Data Query), Datastage, SQL Server Integration ( SSIS) .
OLAP/DSS Tools: Business Objects XI, Hyperion, Crystal Reports, Tableau and Power BI
Databases: Oracle 10g/11g/12c,Sybase, DB2, MS SQL Server … Teradata v2r6/v2r5, Netezza, HBase, MongoDB, Cassandra and Snowflake.
Others: AWS Cloud, AWS Redshift, TOAD, PL/SQL Developer, Tivoli, Cognos, Visual Basic, Perl, SQL-Navigator, Test Director, Win Runner Database Skills Stored Procedures, Database Triggers and packages
Data Modeling Tools: Physical and Logical Data Modeling using ERWIN
Languages: SQL UNIX shell scripts, Python
Operating Systems: Windows NT/ AIX, LINUX, UNIX
Confidential, Chicago IL
Sr. ETL Lead/Informatica Developer
- Requirements gathering, analyze, design, code, test highly efficient and highly scalable integration solutions using Informatica, Oracle, SQL, Source systems viz and i nvolved in the technical analysis of data profiling, mappings, formats, data types, and development of data movement programs using Power Exchange and Informatica.
- Developed ETL mapping document which includes implementing the data model, implementing the incremental/full load logic and the ETL methodology.
- Worked on Informatica BDE for retrieving data from Hadoop's HDFS file system and w orking on a Cloudera Hadoop platform to implement Bigdata solutions using Hive, Map reduce, shell scripting, and java technologies.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and d ata Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Performed the data profiling and analysis making use of Informatica Data Explorer (IDE) and Informatica Data Quality (IDQ) and Design, Development, Testing and Implementation of ETL processes using Informatica Cloud.
- Wrote ETL jobs to read from web API using REST and HTTP calls and loaded into HDFS using java and w orked with existing Python Scripts, and made additions to the Python script to load data from CMS files to Staging Database and to ODS.
- Used Spark Streaming to stream data from external sources using Kafka service and r esponsible for migrate the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB.
- Involved in writing Teradata SQL bulk programs and in Performance tuning activities for TeradataSQL statements using Teradata EXPLAIN and using Teradata Explain, PMON to analyze and improve query performance.
- Developed various ETL process for complete end to end Data Integration and d one Data Integration between DB2 and Oracle using ETL Process
- Design, development of mappings, transformations, sessions, workflows and ETL batch jobs to load data into Source/s to Stage using Informatica, T/SQL, UNIX Shell scripts, Control - M scheduling.
- Developed jobs to send and read data from AWS S3 buckets using components like tS3Connection, tS3BucketExist, tS3Get, tS3Put and created the SFDC, Flat File and Oracle connections for AWS Cloud services.
- Imported Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables and imported data using Sqoop from Teradata using Teradata connector.
- Used Informatica power center 10.1.0 to Extract, Transform and Load data into Netezza Data Warehouse from various sources like Oracle and flat files and responsible for creating shell scripts to invoke the informatica workflows through command line and d eveloped mappings for extracting data from different types of source systems (flat files, XML files, relational files, etc.) into our data warehouse using Power Center.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Have used AWS components (Amazon Web Services) - Downloading and uploading data files (with ETL) to AWS system using S3 components.
- Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data to HDFS and involved in creating and running Sessions & Workflows using Informatica Workflow Manager and monitoring using Workflow Monitor.
- Used Teradata SQL Assistant, Teradata Administrator and PMON and data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export, Tpump, and TPT on UNIX/Windows environments and running the batch process for Teradata.
- U se of python and data visualization (Tableau) and its libraries like numpy, pandas and matplotlib and Converted interfaces running with UNIX to Python Script to boost performance
- Involved in business analysis and technical design sessions with business and technical staff to develop Entity Relationship/data models, requirements document, and ETL specifications a nd Dig deep into complex T-SQL Query and Stored Procedure to identify items that could be converted to Informatica Cloud ISD.
- Involved in designing, documenting and configuring Informatica Data Director for supporting management of MDM data.
- Developed automated data pipelines from various external data sources (web pages, API etc) to internal data warehouse (SQL server) then export to reporting tools by Python.
- Created Informatica mappings with PL/SQL stored procedures/functions to in corporate critical business functionality to load data and d eveloped UNIX Shell scripts for data extraction, running the pre/post processes and PL/SQL procedures.
- Created data load process to load data from OLTP sources into Netezza and created external tables in NZLOAD process in Netezza
Environment: Informatica Power Centre 10.1.0, AWS S3, AWS Redshift, Spark, AWS Glue, Python, Power exchange, Oracle 12c, Netezza, UNIX, Teradata, SQL & PLSQL, Informatica Cloud, SQL Server, Korn Shell Scripting, XML, T-SQL, UNIX, Excel, MDM, Flat Files, Tivoli, Snowflake, MongoDB, Hadoop, Cassandra, Pyspark, Python, HDFS.
Confidential, Bloomington IN
Sr. ETL/ Informatica Developer
- Responsible for requirement definition and analysis in support of Data warehousing efforts and worked on ETL Tool Informatica to load data from Flat Files to landing tables in SQL server.
- Used REDSHIFT for allowing tables to be read while they are being incrementally loaded or modified and sourced data form RDS and AWS S3 bucket and populated in Teradata target and mounted S3 bucket in local UNIX environment for data analysis.
- Imported data from RDBMS environment into HDFS using Sqoop for report generation and visualization purpose using Tableau and converting Hive based applications to Spark Framework using SparkRDDs/DataFrames/Datasets with Scala/Python.
- Extensively used Informatica client tools Source Analyzer, Warehouse designer, Mapping designer, Transformation Developer, Informatica Repository Manager and Informatica Workflow Manager and developed and tested all the Informatica mappings and update processes.
- Used PostgreSQL features that are suited to smaller-scale OLTP processing, such as secondary indexes and efficient single-row data manipulation operations, have been omitted to improve performance using REDSHIFT.
- Finalize Informatica data integration processes for the client system. With the major responsibilities of Informatica defect fix, Unit testing, Informatica mapping and system performance tuning, Informatica workflow recreate and combine with UNIX shell script to automate ETL systems.
- Developed the Pig UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts and u sed Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS
- Worked with mappings from varied transformation logics like Unconnected and Connected Lookups, Router, Aggregator, Filter, Joiner, Update Strategy.
- Created data partitions on large data sets in S3 and DDL on partitioned data and converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Stored data in AWS S3 similar to HDFS. Also performed EMR programs on data stored in S3 and created Pre/Post Session/SQL commands in sessions and mappings on the target instance.
- Coding using Teradata Analytical functions, BTEQ SQL of TERADATA, write UNIX scripts to validate, format and execute the SQLs on UNIX environment and Worked on Coding Teradata SQL, Teradata Stored Procedures, Macros and Triggers.
- Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs and load log data into HDFS using Flume, Kafka and performing ETL integrations and w orked on predictive and what-if analysis using Python from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
- Fixing invalid Mappings, testing of Stored Procedures and Functions, Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
- Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Aggregator and Sequence generator and scheduled Sessions and Batches on the Informatica Server using Informatica Server Manager/Workflow Manager.
- Designed and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions SCD-Type1 and SCD-Type2.
- Used NZSQL/NZLOAD utilities and developed LINUX Shell scripts to load data from flat files to Netezza database and d eveloped Data Mapping, Data Governance, transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS
- Participated in reconciling data drawn from multiple systems across the company like Oracle 12c, flat files into Oracle data warehouse and Performed match/merge and ran match rules to check the effectiveness of MDM process on data.
- Extensively used Aginity Netezza work bench to perform various DML, DDL etc. operations on Netezza database and performed data cleaning and data manipulation activities using NZSQL utility and worked on Netezza database to implement data cleanup, performance-tuning techniques
- Scheduling the Informatica workflows using Control-M, Tivoli scheduling tools& trouble shooting the Informatica workflows.
- Did extensive work with ETL testing including Data Completeness, Data Transformation & Data Quality for various data feeds coming from source and developed online view queries and complex SQL queries and improved the query performance for the same.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.
- Designed and develop data movements using SQL Server Integration Services, TSQL and Stored Procedures in SQL.
- Worked on Teradata Table creation and Index selection criteria and Worked on Teradata Macro and Procedures and Worked on Basic Teradata Query (BTEQ) language
Environment: Informatica Power Center 9.6 (Informatica Designer, Spark, Informatica Data integrator, Python, Workflow Manager, Workflow Monitor), Python, AWS Cloud, AWS Redshift, Hadoop, HDFS, HBase, Tivoli, MDM, Hive, NOSQL, Databricks Cloud, Oracle 12c, Flat files, SQL, XML, PL/SQL, Business Objects, Teradata, Tivoli, Teradata SQL Assistant, UNIX, Shell Scripts, SQL, MDM, Netezza.
Confidential, Boston, MA
Sr. ETL/SSIS Developer
- Created technical design specification documents for Extraction, Transformation and Loading Based on the business requirements and worked for Development, Enhancement & Supporting the Enterprise Data Warehouse (EDW) and d eveloped MLOAD scripts to load data from Load Ready Files to Teradata Enterprise Data warehouse (EDW).
- Analyze business requirements, technical specification, source repositories and physical data models for ETL mapping and process flow.
- Used SSIS as ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases and extensively used Pre-SQL and Post-SQL scripts for loading the data into the targets according to the requirement.
- Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2008 to MS SQL 2014 as well as Created interface stored procedures used in SSIS to load/transform data to the database.
- Involved in the full HIPAA compliance lifecycle from GAP analysis, mapping, implementation, and testing for processing of Medicaid Claims. worked with data mapping team for ICD 9 to ICD10 for forward mapping of the diagnosis and procedure codes
- Extracted Data from various Heterogeneous Databases such as Oracle and Access database, DB2, flat files to SQL server 2014 using SSIS.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift and have used AWS components (Amazon Web Services) - Downloading and uploading data files (with ETL) to AWS system using S3 components and Stored data from SQL Server database into Hadoop clusters which are set up in AWS EMR.
- Developed database objects such as SSIS Packages, Tables, Triggers, and Indexes using T-SQL, SQL Analyzer and Enterprise Manager
- Worked with Teradata SQL Assistant and Teradata Studio and responsible for design and developing Teradata BTEQ scripts, MLOAD based on the given business rules and design documents.
- Worked with SSIS packages involved FTP tasks, Fuzzy Grouping, Merge, and Merge joining, Pivot and Unpivot Control Flow Transformations.
- Worked with Metadata Manager which uses SSIS workflows to extract metadata from metadata sources and load it into a centralized metadata warehouse.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Designed and developed UnixShell Scripts,FTP, sending files to source directory & managing session files
- Created Databases, Tables, Cluster/Non-Cluster Index, Unique/Check Constraints, Views, Stored Procedures, Triggers and optimizing Stored Procedures and long running SQL queries using indexing strategies and query optimization techniques and migrated data from Excel source to CRM using SSIS.
- Integrate complex Medicaid principles and policies into the Medicaid Management Information System (MMIS), requiring knowledge in areas of health systems and Medicaid information processing.
- Loaded data from various data sources and legacy systems into Teradata production and development warehouse using BTEQ, FASTEXPORT, MULTI LOAD, and FASTLOAD.
- Created Business Requirements Document (BRD) containing the glossary, Low Level Design Document, Technical design document, Tivoli scheduling flow document & Migration manual.
- Worked extensively with different Caches such as Index cache, Data cache, Lookup cache (Static, Dynamic and Persistence) and Join cache while developing the Mappings and worked with Static, Dynamic and Persistent Cache in lookup transformation for better throughput of Sessions.
- Worked on Planning and design database changes necessary for data conversion, tool configuration, data refresh on Netezza.
- Used Cognos Data Manager supports high performance analysis of relational data by creating aggregate tables at multiple levels within and across hierarchies in the dimension tables
- Gathered Business Requirements, analyzed data scenarios, build; unit tested, and migrated Self Service Cognos Reports from DEV to QA
- Created watches, probes and alerts to monitor performance and availability of services within Business Objects platform
- Designed Cubes with Star Schema using SQL Server Analysis Services 2012 (SSAS), Created several Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server 2012 Analysis Services (SSAS).
- Involved in Creating Data warehouse based on Star schema and worked with SSIS packages to load the data into the database.
- Developed complex stored procedure using T-SQL to generate Ad hoc reports using SSRS and developed various reports using SSRS, which included writing complex stored procedures for datasets.
- Explored data in a variety of ways and across multiple visualizations using Power BI, Strategic expertise in design of experiments, data collection, analysis and visualization.
- Writing Complex T-SQL Queries, Sub queries, Co-related sub queries and Dynamic SQL queries and created on demand (Pull) and Event Based delivery (Push) of reports according to the requirement.
Environment: SSIS 2014, Oracle 11g, SQL Server 2014, SQL, Tivoli, Teradata, Business Objects, DB2, Netezza, Flat files, UNIX, Windows, Teradata SQL assistant, Cognos, Shell scripting, SSAS, SSRS, T-SQL, XML, Excel, Tivoli, Microsoft, PL/SQL, MS SQL Server, Autosys.