Sr. Data Engineer Resume
Indianapolis, IN
SUMMARY
- Over 7+ years of progressive hands - on experience in analysis, ETL processes, design and development of enterprise level data warehouse architectures, enterprise level large databases, designing, coding, testing, and integrating ETL.
- Two plus year of experience inIICS Intelligent Informatica Cloud Services (IICS).
- Highly proficient in Development, Implementation, Administration and Support of ETL processes for Large-scale Data warehouses by Using InformaticaPower Center 10.x/9.x
- Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, MS SQL Server, DB2, Teradata, Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
- Experience on working with Teradata Utilities such as BTEQ, Fast Load, Multi Load, Xml import, Fast Export, Teradata SQL Assistant, Teradata Administrator and PMON.
- Superior SQL skills and ability to write and interpret complex SQL statements and also skillful in SQL optimization and ETL debugging and performance tuning.
- Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB
- Implementation of data movements from on-premises to cloud in Azure.
- Develop batch processing solutions by using Data Factory and Azure Data bricks
- Implement Azure Data bricks clusters, notebooks, jobs and auto scaling.
- Experience in branching, tagging and maintaining the version across the environments working on Software Configuration Management (SCM) tools like Subversion (SVN) and GIT.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Slowly Changing Dimensions Management including Type 1, 2, 3, Hybrid Type 3, De-normalization, Cleansing, Conversion, Aggregation, Performance Optimization.
- Experience in migration of Informatica servers to new data center.
- Experience in managing all CM tools (JIRA, SVN, Maven, Jenkins, Git, GitHub, and Visual Studio) and their usage / process ensuring traceability, repeatability, quality and support.
- Extensively experience in developing Informatica Mappings / Mapplets using various Transformations for Extraction, Transformation and Loading of data from Multiple Sources to Data Warehouse and Creating Workflows with Worklets & Tasks and Scheduling the Workflows.
- Experience in working with Designer, Work Flow Manager, Work Flow Monitor, Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Workflow Designer, Task Developer, Worklet Designer, Gantt Chart, Task View, Mapplets, Mappings, Workflows, Sessions, Re-usable Transformations, Shortcuts, Import and Export utilities.
- Experience in Data Warehouse development working with Extraction/Transformation/Loading using Informatica Power Mart/Power Center with flat files, Oracle, SQL Server, and Teradata.
- Experience working on Data quality tools Informatica IDQ 9.1.
- Experience working in multi-terabytes data warehouse using Databases like Oracle 11g/10g/9i, MS Access 2000/2002, XML, IBM UDB DB2 8.2, SQL Server 2008, MS Excel and Flat files.
- Experience Relational Modeling and Dimensional Data Modeling using Star Snow Flake schema, De normalization, Normalization, and Aggregations.
- Very strong in SQL and PL/SQL, extensive hands on experience in creation of database tables, triggers, sequences, functions, procedures, packages, and SQL performance-tuning.
TECHNICAL SKILLS
Operating System: UNIX, Windows, MS-DOS
Language/Tools: SQL, PL/SQL, C, C++
Scheduling Tools: Autosys, Control-M, Informatica Scheduler
ETL Tools: Informatica Power Center 10.x/9.x/8.x, ETL Informatica Cloud, SSIS, Informatica Intelligent Cloud Service (IICS)
Database: MS SQL Server, Oracle 8i/9i/10g, RDBMS DB2, Netezza, Teradata, PostgreSQL, Redshift
Scripting: Shell Scripting, Python
Data Modeling Tools: Microsoft Visio, ERWIN 9.3/7.5
Data Modeling ER: (OLTP) and Dimensional (Star, Snowflake Schema)
Data Profiling Tools: Informatica IDQ 10.0, 9.5.1, 8.6.1
Excel Tools & Utilities: TOAD, SQL Developer, SQL*Loader, Putty
Cloud Computing: Amazon Web Services (AWS), RDS, Redshift, SNS.
Other Tools: Notepad++, Toad, SQL Navigator, Teradata SQL Assistant, Snaplogic, AWS, Appworx
Defect Tracking Tools: ALM, Quality Center
Reporting Tools: IBM Cognos, Tableau 9
PROFESSIONAL EXPERIENCE
Confidential, Indianapolis, IN
Sr. Data Engineer
Responsibilities:
- Worked with the IT architect, Program managers in requirements gathering, analysis, and project coordination
- Developed Data Integration Platform components/processes using Informatica Cloud Platform, Azure SQL Datawarehouse, Azure Data lake Store and Azure Blob Storage technologies
- Analyzed existing ETL Datawarehouse process and ERP/NON-ERP Applications interfaces and created design specification based on new target Cloud Datawarehouse (Azure Synapse) and Data Lake Store
- Created ETL and Datawarehouse standards documents - Naming Standards, ETL methodologies and strategies, Standard input file formats, data cleansing and preprocessing strategies
- Created mapping documents with detailed source to target transformation logic, Source data column information and target data column information
- Responsible for estimating teh cluster size, monitoring, and troubleshooting of teh Spark databricks cluster.
- Designed, Developed and Implemented ETL processes using IICS Data integration
- Created IICS connections using various cloud connectors in IICS administrator
- Installed and configured Windows Secure Agent register with IICS org
- Involved in database design techniques using dimensional data modelling strategies, advanced SQL and PostgreSQL skills to extract, transform and load extremely large data volumes (Petabytes) in Greenplum.
- Extensively used performance tuning techniques while loading data into Azure Synapse using IICS
- Extensively used cloud transformations - Aggregator, Expression, Filter, Joiner, Lookup (connected and unconnected), Rank, Router, Sequence Generator, Sorter, Update Strategy, Union Transformations
- Extensively used cloud connectors Azure Synapse (SqlDW), Azure Data lake Store V3, Azure BLoB Storage, Oracle, Oracle CDC and SQL Server
- Developed Cloud integration parameterized mapping templates (DB, and table object parametrization) for Stage, Dimension (SCD Type1, SCD Type2, CDC and Incremental Load) and Fact load processes
- Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming teh data to uncover insights into teh customer usage patterns.
- Extensively used Parameters (Input and IN/OUT parameters), Expression Macros and Source Partitioning Partitions
- Extensively used Push Down Optimization option to optimize processing and use limitless power of Azure Synapse (SqlDW)
- Extracted data from Snowflake to push the data into Azure warehouse instance to support reporting requirements
- Performed loads into Snowflake instance using Snowflake connector in IICS for a separate project to support data analytics and insight use case for Sales team
- Created PYTHON scripts to create on demand Cloud Mapping Tasks using Informatica REST API
- Created PYTHON scripts which will used to start and stop cloud Tasks (the scripts use Informatica Cloud API calls)
- Developed CDC load process for moving data from Peoplesoft to SQL Datawarehouse using “Informatica Cloud CDC for Oracle Platform”
- Developed complex Informatica Cloud Taskflows (parallel) with multiple mapping tasks and taskflows
- Developed MASS Ingestion tasks to ingest large datasets from on-prem to Azure Data lake Store - File ingestion
- Designed Data Integration Audit framework in Azure SqlDw to track data loads, data platform workload management and produce automated reports for SOX compliance
- Worked with developers in establishing and applying appropriate branching, labelling / naming conventions using GIT source control.
- Source code management with Git hub, GitLab's codes to various branches and setup Jenkins.
- Worked with a team of 4 onshore and 6 offshore development teams and prioritizing project tasks
- Involved in Development, Unit Testing, SIT and UAT phases of project
Environment: Informatica Intelligent Cloud Services, Informatica Powercenter 10.2, Informatica PowerExchange 10.2, Windows Secure Agent, Teradata v1310, Azure Synapse (Azure SqlDW), Azure Data Lake Store, SQL Database, Tableau Server & Desktop
Confidential, Chicago, IL
Sr. Data Engineer
Responsibilities:
- Involved in gathering business requirements and attended technical review meetings to understand the data warehouse model.
- Developed Technical Specifications of the ETL process flow.
- Develop ETL jobs usingInformatica Intelligent Cloud Services (IICS), Informatica Power Center (10.2), Informatica Power exchange.
- RewriteInformatica Power Center into IICS jobs.
- Created reusable mappings in IICS.
- Have usedApplication Integration, Data Integration, Data Synchronization, Mass ingestion, administration, Monitor etc. services in IICS.
- Constructed IICS mappings, mapping tasks, process to extract data from various sources like Salesforce, OraclesServer and loaded into Teradata andAmazon Redshift datawarehouse.
- Created various types of Connectors using IICS Administration, Application Integration Service connectors.
- Analyze the existing Power Center workflows and re write theETL code into IICS assetslike mappings, mapping tasks, linear task flows, advanced task flows etc.
- ImplementedData Synchronization, Data Replication, Mass Ingestion, mapping tasks in IICS.
- BuildPower exchangedata maps to process real time (real time CDC/Batch CDC) data from Db2 source tables into Staging tables.
- Implemented audit process to ensure to data accuracy between Source and EDW systems.
- CreatedReusable mapping templates and reuse the same logic in IICS mapping tasks.
- Have implemented Informatica Intelligent Cloud Services to read/write operations on various data sources like Oracle, SQL Server, Salesforce, Teradata, Amazon S3, Mongo Db, Snowflake etc.
- Created ETL jobs using Power Exchange Data maps to perform real time CDC data from Mainframe Db2 tables into Staging tables.
- Configured Jenkins with Git for appropriate release build. Scheduled automated nightly build using subversion.
- Implemented Push down optimization techniques.
- Implemented TPT connections improve Teradata load performance to process huge data volumes.
- Responsible for troubleshooting ETL job failures.
- Managed ETL control tables to handle batch process incremental/CDC data.
- Work on code changes of existing bugs includes providing Root Cause analysis, Data fixes and the enhancement requirements.
- Develop IICS jobs to extract data from various sources data intoData lake layer like AWS S3buckets.
- Developed ETLs to load into cloud data warehouse Snowflake.
- Worked withSOAP and REST APIcalls and processed semi structured data.
- Involved in data modeling like created database tables, views.
- Created UNIX and Power Shell scripts for automating repetitive tasks like FTP, Archival, file validation steps etc.
- Performed unit, system, parallel testing, and assisted users during UAT phases.
- Troubleshoot the long running process by analyzing the logs and implemented the appropriate tuning techniques.
- Worked with Cognos team to understand the reports and fix the data issues
- Extensively used workflow variables, mapping parameters and mapping variables.
- Performed Error handling, audit validations and recovery techniques.
- Tuned the Informatica mappings, sessions for optimal load performance.
- Implemented Session partitions to achieve parallel processing mechanism to improve ETL performance.
Environment: Informatica 10.X Informatica IICS, Oracle 11g, Sql Server 2019, Azure Sql Data warehouse, Teradata v15, flat files, Excel Files, Salesforce, Cognos Reporting, batch and python scripting
Confidential, Farmington Hills, MI
Sr. ETL Developer
Responsibilities:
- Design Develop and Support various ETL processes in the Data Warehouse Environment including Operational Data Stores, Canonical and Enterprise Dimensional Models and various other Business Intelligence databases
- Worked for production support team for few months and ensured to clear up the existing defects created by the data governance team
- Created mappings, mapping configuration tasks and task flows withIntelligent Informatica Cloud (IICS)and Informatica Power Center (10.1.1 & 9.6.1).
- Assist Data Governance team in their tasks to create Source to Target Mapping and Defect Fix Specification documentation as part of Design, Development and Production Support activities
- Automated multiple redundancy tasks to reduce the time taken for the entire process using the batch files and shell scripts
- Migrated Informatica servers to new data center by making the necessary configuration changes in server side.
- Created user variables, property expressions, script task in SSIS.
- Implementing various SSIS packages having different tasks and transformations and scheduled SSIS packages.
- Used Python to implement different machine learning algorithms, including Generalized Linear Model, Random Forest and Gradient Boosting.
- Expertise in using Teradata Utilities BTEQ, M-Load, F-Load, TPT and F-Export in combination with Informatica for better Load in to Teradata Ware House.
- Built several BTEQ to load data from Stage to Base after considering several performance techniques in Teradata sql.
- Involved in Teradata upgrade process from TD 12/TD 14.
- Demonstrated expertise in utilizing ETL tool Informatica Cloud (IICS), Informatica PowerCenter 10.2/9.x for Datawarehouse loads as per client requirement.
- Working with Informatica product team on the issues we faced in IICS product.
- Used Python to develop a variety of models and algorithms for analytic purposes.
- Gathering requirements for source to Data Acquisition layer loads
- Implemented end-to-end process of reading files from various vendors, validating the files, loading the tables and archiving the files.
- Created complex mappings to load the data from Oracle to Redshift tables.
- Planning, Estimation and Management of the Upgrade projects e.g. Informatica Power Center v8.x to v10.x,
- Informatica Data Service 9.xto 10.x; ICS to IICS, RStudio version upgrade
- Planning and implementation of IICS sandbox environment. Work with users and Informatica Cloud to troubleshoot/fix ongoing issue. Work with Informatica to implement IICS upgrade and fix post upgrade anomalies
- Perform Query Optimization techniques on long running Oracle/SQL Server queries using Partitions, indexes, parallel hints to reduce the run time of Historical Data Fixes
- Designed the code to check different scenarios in handling flat files and sending notifications to the Production Support teams as needed
- Involved in writing scripts for loading data to target data Warehouse for BTEQ, FastLoad, and MultiLoad.
- Converted business requirements to functional specifications for the team and created mappings to load tables which provide data for reporting purposes.
- Extensively worked in the performance tuning of Teradata SQL, ETL and other processes to optimize session performance.
- Loaded data in to the Teradata tables using Teradata Utilities Bteq, Fast Load, Multi Load, and Fast Export, TPT.
- Handled the code migrations and production support handover meetings
- Created the Solution Design Documents (SDS) for various projects to include all the design details, load dependencies, error handling
- Developed complex source MINUS target queries to validate the data loads.
Environment: Informatica PowerCenter 10.2/9.6, Python,Teradata 14/12, Oracle 10g/11g,Intelligent Informatica Cloud Services (IICS), Redshift, PL/SQL, Putty SSH Client, Power Exchange, Appworx, UNIX, Windows XP
Confidential, Minneapolis, MN
Sr. Informatica/ETL Developer
Responsibilities:
- Coordinated with business analysts to analyze the business requirements and designed and reviewed the implementation plan.
- Responsible for designing and development, testing of processes necessary to extract data from operational databases, Transform and Load it into data warehouse using Informatica Power center.
- Followed ETL standards -Audit activity, Job control tables and session validations.
- Created Complex Mappings to load data using transformations like Source Qualifier, Expression, Aggregator, Dynamic Lookup, Connected and unconnected lookups, Joiner, Sorter, Filter, Stored Procedures, Sequence, Router and Update Strategy.
- Created different jobs using UNIX shell scripting to call the workflow by using Command tasks.
- Writing Oracle SQL queries to join or any modifications in the table.
- Design and developed complex informatica mappings including SCD Type 2 (Slow Changing Dimension Type 2).
- Developed BTEQ scripts to load data from Teradata Staging area to Teradata data mart.
- Worked on complex mapping for the performance tuning to reduce the total ETL process time.
- Extensively used TOAD to test, debug SQL and PL/SQL Scripts, packages, stored procedures and functions.
- Extracted and transformed data from various sources like Teradata and relational databases (Oracle, SQL Server).
- Created snaplogic jobs for pulling the data from salesforce.
- Analyze source data coming from multiple sources System. Design and develop data warehouse model in a flexible way to cater the future business needs.
- Ability to analyze existing systems, conceptualize and design new ones, and deploying innovative solutions with high standards of quality.
- Development of ETL code to extract data from multiple sources and load to Data warehouse using Informatica and load data into AWS Redshift.
- Involved in enhancements and maintenance activities of the data warehouse including tuning, code enhancements.
- Worked on Informatica Cloud Services to load the data into Salesforce Objects.
- Involved in Data Center Migration, up gradation of Informatica DIH and Control-M
- Create Unit Test Case document and capture Unit test results for each source system.
- Working in Agile environment and experienced with daily stand-ups, sprints, and tracking stories using JIRA application.
Environment: Informatica PowerCenter 10, Python, Control M, Oracle11g, Toad, Redshift, Razor SQL, WinSCP, Composite, Netsuite, Snaplogic, UNIX and TWS.
Confidential
Informatica Developer
Responsibilities:
- Developed internal and external Interfaces to send the data in regular intervals to Data warehouse systems.
- Extensively used Power Center to design multiple mappings with embedded business logic.
- Involved in discussion of user and business requirements with business team.
- Performed data migration in different sites on regular basis.
- Involved in upgrade of Informatica from 9.1 to 9.5.
- Portfolio Management Enhancement: Analyzed/developed business requirement, designed/created database and processes to load data from Broad ridge using Informatica, SQL Server etc.; fixed defects; Wrote complicate stored procedures to generate data for PM web reports.
- Created complex mappings using Unconnected Lookup, Sorter, and Aggregator and Router transformations for populating target tables in efficient manner.
- Attended the meetings with business integrators to discuss in-depth analysis of design level issues.
- Provide work Bucket hour estimation and budgeting for each story (agile process) and communicate status to PM.
- Involved in data design and modeling by specifying the physical infrastructure, system study, design, and development.
- Extensively involved in performance tuning of the Informatica ETL mappings by using the caches and overriding the SQL queries and by using Parameter files.
- Analyzed session log files in session failures to resolve errors in mapping or session configuration.
- Written various UNIX shell Scripts for scheduling various data cleansing scripts, loading process and automating the execution of maps.
- Worked under Agile methodology and used Rally tool one to track the tasks.
- Performed bulk data imports and created stored procedures, functions, views and queries.
Environment: Informatica MDM, Autosys, Oracle11g, SAP, Toad, WinSQL, ERWIN, UNIX.