Sr. Etl/data Warehouse Developer Resume
Houston, TX
SUMMARY
- 10+ Years of IT experience with expertise in analysis, design, development and implementation of Data warehouses, data marts and Decision Support Systems involving relational and non - relational databases.
- Evaluating technology stack for building Analytics solutions on cloud by doing research and finding right strategies, tools for building end to end analytics solutions and help designing technology roadmap for Data Ingestion, Data lakes, Data processing and Visualization.
- Strong Experience in implementing Data warehouse solutions in Confidential Redshift and worked on various projects to migrate data from on premise databases to Confidential Redshift, RDS and S3.
- Well versed with languages like Python, PySpak, and SQL and good understanding on data extraction, transformation and load in Hive, Pig and HBase and experience with data transformation from HDFS, HIVE, PIG, HBase, and Oracle.
- Strong experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow Monitor, Metadata Manger), Power Exchange, Power Connect, SSIS as ETL tool on Oracle, DB2 and SQL Server Databases.
- Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Sqoop, Hive, Impala, Hbase and Kafka and experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
- Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift and experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys and Has good exposure working on AWS Cloud Utilities like AWS S3, AWS Glue, AWS Redshift and Airflow.
- Good Understanding about Azure SQL DWH, Azure ADLS, Azure Data Factory concepts relating to storage, distribution, DWU units, resource user groups, connection strings etc
- Extensively used Parallel Extender to load data into Data Warehouse with different techniques like Pipeline and Partition in MPP environment.
- Expertise in Data Warehouse/Data mart, ODS, OLTP and OLAP implementations teamed with project scope, Analysis, requirements gathering, data modeling, Effort Estimation, ETL Design, development, System testing, Implementation, and production support.
- Extensive experience in developing Stored Procedures, Functions, Views and Triggers, Complex SQL queries using SQL Server, TSQL and Oracle PL/SQL.
- Extensively worked on Informatica B2B Data Exchange Setup from Endpoint creation, Scheduler, Partner setup, Profile setup, Event attributes creation, Event status creation, etc.
- Extensively worked with XML files as the Source and Target, used transformations like XML Generator and XML Parser to transform XML files, used Oracle XMLTYPE data type to store XML files.
- Expertise in building Enterprise Data Warehouses (EDW), Operational Data Store (ODS), Data Marts, and Decision Support Systems (DSS) using Data modeling tool ERWIN and Dimensional modeling techniques (Kimball and Inmon), Star and Snowflake schema addressing Slowly Changing Dimensions (SCDs).
- Expertise in Dimensional and Relational Physical & logical data modeling using Erwin and ER/Studio and Extensively worked with Oracle PL/SQL Stored Procedures, Functions and Triggers and involved in Query Optimization.
- Extensively involved in Optimization and Tuning of mappings and sessions in Informatica by identifying and eliminating bottlenecks, memory management and parallel threading.
- Extensive Experience in designing and developing various complex mappings applying various transformations such as lookup, source qualifier, update strategy, router, sequence generator, aggregator, rank, stored procedure, filter joiner and sorter transformations, Mapplets.
- Having exposure on Informatica Cloud Services and experience in identifying Bottlenecks in ETL Processes, improving the Performance of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, and Normalization / Denormalization strategies.
- Excellent technical and professional client interaction skills. Interacted with both Technical, functional and business audiences across different phases of the project life cycle.
TECHNICAL SKILLS
ETL Tools: InformaticaPowerCenter 10.2/9.6/9.5/9.1 , Informatica IDQ/Data Analyst, Power Exchange, Informatica B2B (DT/DX) and SSIS.
Reporting Tools: Tableau, OBIEE, Business Objects XI, Cognos, SSRS (SQL Reporting Services),Qlik View.
Databases: Oracle 10g/11g/12c, Netezza, IBM DB2 UDB 8.0/7.0, PostgreSQL, Teradata, MS SQL Server 2016/2014/2012/2008 , MS Access.
Data Modeling/Methodology: MS Visio, ERWIN, Ralph-Kimball Methodology, Bill-Inman Methodology, Star Schema, Snow Flake Schema, Extended Star Schema, Physical And Logical Modeling.
Programming Skills: SQL, PL/SQL, T-SQL, Python 3.x, 2.x and PySpark
Scripting Languages: Unix Shell Scripting, Perl Scripting, Java Script, Bash Scripting.
Scheduling Tools: Autosys, Control-M, Tidal.
Operating Systems: UNIX, LINUX,Windows.
Utilities: SQL* PLUS, SQL * Loader, TOAD, PVCS, Visio.
BigData and Cloud Technologies: Hadoop Framework, Hive, NoSQL Databases (MongoDB, Cassandra), Sqoop, Spark, Kafka, HDFS, AWS S3, AWS Glue, AWS Redshift, Airflow, Azure DW, Azure SQL and Data Factory.
PROFESSIONAL EXPERIENCE
Confidential, Houston TX
Sr. ETL/Data warehouse Developer
Responsibilities:
- Worked with business analysts for requirement gathering, business analysis, and translated the business requirements into technical specifications to build the Enterprise data warehouse and Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products
- Designing Azure DataWareHouse and using Azure Data Factory
- Extensively using Informatica Client tools - Source Analyzer, Warehouse Designer, Mapping Designer, Mapplet Designer, Informatica Repository Manager, and Informatica Workflow Manager and developed various complex mappings using Mapping Designer and worked with Aggregator, Lookup (connected and unconnected), Filter, Router, Joiner, Source Qualifier, Expression, Stored Procedure, Sorter, and Sequence Generator transformations.
- Involved in Designing and Developing ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and involve with Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
- Involved in the development of the conceptual, logical, and physical data model of the star schema using ERWIN and worked on Normalization and De-normalization concepts and design methodologies like Ralph Kimball and Bill Inmon's Data Warehouse methodology.
- Design and develop various SSIS packages (ETL) to extract and transform data and involved in Scheduling SSIS Packages.
- Extracted and loaded CSV files, JSON files data from AWS S3 to Snowflake Cloud Data Warehouse and Created tables, views, secure views, user defined functions in Snowflake Cloud Data Warehouse.
- Optimized and Tuned SQL queries used in the source qualifier of certain mappings to eliminate Full Table scan and performed the data validations and control checks to ensure the data integrity and consistency.
- Assist architect on developing STG/ ODS / Hub / dimensional warehouse in Azure Sql Data warehouse.
- Implemented Spark using Python and utilizing Data frames and Spark SQL API for faster processing of data and worked on extensible framework for building high performance batch and interactive data processing application on hive. Created and Configured Workflows, Worklets and Sessions to transport the data to target warehouse tables using Informatica Workflow Manager and created various tasks like Email, Event-wait and Event-raise, Timer, Scheduler, Control, Decision, Session in the workflow manager.
- Involed in Design and develop ETL integration patterns using Python on Spark and developing framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs and creating Pyspark frame to bring data from DB2 to Amazon S3.
- Developed automateddatapipelines from various externaldatasources (web pages, API etc.) to internaldatawarehouse (SQL server) then export to reporting tools byPython.
- Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, Power Shell.
Confidential - Chicago IL
Sr. Data Warehouse/ETL Informatica Developer
Responsibilities:
- Create and maintain optimal data pipeline architecture and build the infrastructure required for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources like Salesforce, SQL Server, Oracle using AWS, Spark, Python, Hive, Kafka and other Bigdata technologies.
- Translated Business Requirements into Informatica mappings to build Data Warehouse by using Informatica Designer, which populated the data into the target.
- Hands on experience in using Elastic container services / Azure Container Services
- Create and develop various complex mappings, applying various transformations such as SQL transformation, Java transformation, Web Service Consumer transformation, Union, Lookup, Stored Procedure, Filter, Joiner and Sorter transformations and create Mapplets in Informatica Power Center (9.6 Version or higher) and Informatica Developer client.
- Developed, deployed, and monitored SSIS Packages.
- Migrated Oracle database tables data into Snowflake Cloud Data Warehouse and integrated and automated data workloads also ensure ETL/ELT succeeded and loaded data successfully in Snowflake DB.
- Worked on Power Exchange bulk data movement process by using Power Exchange Change Data Capture (CDC) method, Power Exchange Navigator, Power Exchange Bulk Data movement. Power Exchange CDC can retrieve updates at user-defined intervals or in near real time.
- Developed SSIS packages to extract, transform and load data from Oracle and SQL Server databases into Data Warehouse.
- Migrated On prem informatica ETL process to AWS cloud and Snowflakes and Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Create and modify existing complex mappings in Informatica Data Transformation Studio environment, editing and completing the configuration and use the Parser, Serializer and mapper, Transformation to convert text documents to XML and vice versa and worked on importing and exporting data from Oracle data into HDFS using SQOOP for analysis, visualization and to generate reports.
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data and designed External and Managed tables in Hive and processed data to the HDFS using Sqoop
- Extracted data from various source systems like Oracle, SQL Server and DB2 to load the data into Landing Zone and then by using Java copy command loaded into AWS-S3 Raw Bucket and In order to increase the performance balanced the input files of slice count against large files and loaded into AWS-S3 Refine Bucket and by using copy command achieved the micro-batch load into the Amazon Redshift.
- Create data interchange endpoints using JMS message queues, directories, and managed file transfers and create Power Center mappings and workflows to process documents, incorporating B2B DX and B2B DT transformations when required.
- Created packages in SSIS with error handling and worked with different methods of logging in SSIS.
Environment: Informatica Power Center 9.6, IDQ, Informatica Cloud, Workday, Salesforce, Teradata, Oracle 12c, Toad, UNIX, SQL Server, PostgreSQL, AWS (S3, EMR, Redshift, RDS, Athena, Lambda and Glue), Hadoop (HDFS, Hive, Airflow, Sqoop, Spark, Kafka, MongoDB, Cassandra), Tableau, SQL, PySpark, Python and Shell Scripting.
Confidential - Huntington Beach, CA
Sr. ETL Developer
Responsibilities:
- Connected and unconnected stored procedures, SQL overrides usage in Lookups and source filter usage in Source qualifiers and data flow management into multiple targets using Routers.
- Formulating and implementing Historical load strategy from multiple data sources into the data warehouse.
- Data concurrency, Prioritization, comprehensiveness, completeness and minimal impact to existing users are taken as key attributes for Historical data load strategy.
- Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2008 to MS SQL 2014 as well as Created interface stored procedures used inSSISto load/transform data to the database.
- Analyzing the incremental data and fixing existing data issues in Azure.
- UsedRedshiftfor allowing tables to be read while they are being incrementally loaded or modified and sourced data form RDS and AWS S3 bucket and populated in Teradata target and mounted S3 bucket in local UNIX environment for data analysis.
- Designing the dimensional model and data load process using SCD Type II for the quarterly membership reporting purposes.
- Used Teradata Data Moverto copy data and objects such as tables and statistics from one system to another and Creation of customized Mload scripts on UNIX platform for Teradata loads and wrote Teradata Macros and used various Teradata analytic functions.
- Used PostgreSQL features that are suited to smaller-scale OLTP processing, such as secondary indexes and efficient single-row data manipulation operations, have been omitted to improve performance usingREDSHIFT.
- Developed the Pig UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts.
- Extracted Data from various Heterogeneous Databases such as Oracle and Access database, DB2, flat files to SQL server 2014 usingSSIS and Worked withSSISpackages involved FTP tasks, Fuzzy Grouping, Merge, and Merge joining, Pivot and Unpivot Control Flow Transformations.
- Created, Tested and debugged the Stored Procedures, Functions, Packages, Cursors and triggers using PL/SQL developer.
- Implement ETL and data movement solutions using Azure Data Factory(ADF), Informatica power Centre and Talend Enterprise Edition
- Created Databases, Tables, Cluster/Non-Cluster Index, Unique/Check Constraints, Views, Stored Procedures, Triggers and optimizing Stored Procedures and long running SQL queries using indexing strategies and query optimization techniques and migrated data from Excel source to CRM usingSSIS.
- Designed Cubes with Star Schema using SQL Server Analysis Services 2012 (SSAS), Created several Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server 2012 Analysis Services (SSAS).
- Developed complex stored procedure using T-SQL to generate Ad hoc reports using SSRS and developed various reports using SSRS, which included writing complex stored procedures for datasets.
Environment: SSIS (SQL Server Integration Service), Tableau, AWS S3, AWS Redshift, Pyspark, Salesforce, Teradata, Oracle 12c, PL/SQL, PostgreSQL, Toad, UNIX, SQL Server, Hadoop, Hive, HDFS, Sqoop, MongoDB, Airflow and Python
Confidential - Philadelphia PA
Sr. Informatica ETL/Teradata Developer
Responsibilities:
- Working with the Data WarehousingArchitecture teamin testing complexDelta processing Loads.
- Worked on a POC to use Informatica power exchange for cloud connector to access Workday cloud connector and read/load data into Workday application.
- Developing ETL routines using Informatica Power Center and created mappings involving transformations like Lookup Override, Source qualifier override, and Incremental Aggregation, Mapplet, and XML transformation.
- Responsible fortesting data loadinto dimensional data model using Ralph Kimball methodology and Implemented Star Schema for de-normalizing data for faster data retrieval for Online Systems
- Worked on the Teradata stored procedures and functions to confirm the data and have load it on the table.
- Written modification requests for the bugs in the application and helped developers to track and resolve the problems in Warehousing.
- Used Informatica power center for (ETL) extraction, transformation and loading data from heterogeneous source systems and studied and reviewed application of Kimball data warehouse methodology as well as SDLC across various industries to work successfully with data-handling scenarios, such as data
- Involved in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams with all related entities and relationship with each entity based on the rules provided by the business manager using ERWIN
Environment: Informatica Power Center 9.x, Teradata, Oracle 10g, Toad, UNIX, SQL Server, DB2, Tableau.
