- IT professional with over 6 years of experience primarily in ETL, with expertise in design, development & deployment activities in various applications - Hadoop (Hortonworks), Hive, Sqoop, Pig, Spark, ETL (Informatica/DataStage), Data Quality (IBM Information Analyzer, Informatica Data Quality, Ataccama), Data Pipelines(Apache NIFI),Data Lineage and Metadata (IBM Infosphere Governance Catalog), Data Governance (Collibra) and Data Mining.
- Proficiency in developing SQL with various relational databases like Oracle PL/SQL, SQL Server, Greenplum, Teradata, IBM DB2.
- Experience working in environments like Agile, SCRUM Sprint & Waterfall methodologies throughout SDLC of the project.
- Extensive knowledge of IBM's Information Analyzer's (Data Quality) architecture, processing and deployment/configuration through the ability to create projects, add data sources, write, configure and execute rules/rule sets within Information Analyzer
- Experience with Informatica Data Quality 9.6.1(IDQ) toolkit, Analysis, data cleansing, data matching, data conversion, exception handling, and reporting and monitoring capabilities of IDQ 9.6.1
- Good knowledge of Data modeling techniques like Dimensional/ Star Schema, Snowflake modeling, slowly changing Dimensions.
- Extensive experience with ETL Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Aggregator, Stored Procedure, Sorter, Normalizer, Union.
- Experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
- Knowledge on various AWS services - EC2, S3, ELB, Auto Scaling, RDS, Cloud Watch, and Cloud Trial. Also, possess good knowledge in other Amazon Cloud services - VPC, Networking, Lambda, Beanstalk, Virtualization
- Experience using Teradata utilities (SQL, B-TEQ, Fast Load, Multiload, Fast Export, Tpump), Teradata parallel support and Unix Shell scripting.
- Integrate Collibra DGC using Collibra Connect (MuleESB) with third-party tools such as Ataccama, IBM IGC and Tableau to apply DQ rules, import technical lineage and to create reports using the MetaData in Collibra DGC
- Setting up Data Quality framework to remediate frequently occurring issues across various platforms using Python
- Experience in managing teams/On Shore-Offshore Coordination/Requirement Analysis/Code reviews/Implementing Standards.
Data Engineer/Data Management Tech
- Data import and export from traditional Databases like Microsoft SQL Server, IBM DB2 to Hadoop system. Create conceptual Data Quality rules in Ataccama by binding logical variables to the appropriate Data sources.
- Responsible for developing data pipeline using NIFI, Spark and Hive to ingest, transform and build a data layer for analytical consumption.
- Performed exploratory data analysis on large set of data at rest in Hadoop to build a curated data layer to perform data science activities.
- Troubleshoot framework related work to ensure data ETL process.
- Implement the version control (GitHub) of the source code among various roles, DQ Developer, DQ Admin, Maintainer etc.
- Develop rule execution flow in DSX (Data Science Experience) using Python. Retrieve customer related Data from Hadoop distributed environment. Develop Python scripts to perform necessary checks on the Customer Data
- Created ETL IBM-Datastage parallel jobs to extract and reformat the source data so it can be loaded into the new data warehouse schema
- Participate in development of DQ sync framework using PySpark. Application compares parties between source systems and Master Data Management (MDM) system to determine missing parties and mismatched attributes
- Integrate Ataccama with Collibra using MuleESB connector and publish DQ rule results on Collibra using REST API calls.
- Created import areas for new Data Sources on IGC IMAM. Ingest various DataStage jobs into IGC to see the end to end lineage. Created python framework to ingest and see the Mainframe Files Business and Technic Lineage in the IGC.
- Extensively used IBM IsTools commands to import, create and delete metadata operations on IGC.
Enterprise Data Quality Engineer
- Develop Data Profiling solutions, run analysis jobs and view results, and create and manage data quality controls using IBM Information Analyzer
- Perform column analysis, rule analysis, primary key analysis, natural key analysis, foreign-key analysis, and cross- domain analysis in DQ tool, IBM Information Analyzer (IBM IA).
- Gather and articulate data quality issues with clients on areas such as data privacy, sensitivity and security.
- Define quality rules for data elements and creation of data glossary with data platform owners and consumers.
- Implement the Data Quality logic on the Data Sources by binding the Physical Data Element to variables used in the rule logic.
- Work with data platform owners to fix the issues. Develop and implement data collection systems and other strategies that optimize statistical efficiency and data quality.
- Continuous focus towards raising data quality and performance levels. Provide a high level of data quality awareness across multiple staff profiles e.g. manager, front-line staff etc.
- Use SQL queries (Advanced Query Tool) to analyze the rule execution results obtained from the IBM IA. Use Command Line (CLI) approach to perform rule execution as well as extract exceptions from IBM Information Analyzer tool.
- Used principles of Normalization to improve the performance. Involved in ETL code using PL/SQL in order to meet requirements for Extract, transformation, cleansing and loading of data from source to target data structures
- Present DQ rule result Dashboards to the appropriate stakeholders using IBM Data Science Experience.
Confidential, White Plains, NY
- Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems Databases and flat files
- Expertise in creating databases, users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in Teradata database.
- Proficient in performance analysis, monitoring and SQL query tuning using EXPLAIN PLAN, Collect Statistics, Hints and SQL Trace both in Teradata as well as Oracle PL/SQL.
- Implemented Teradata Macros and used various Teradata analytic functions. Monitoring the Teradata queries using Teradata Viewpoint.
- Created appropriate Primary Index (PI) taking into consideration of both planned access of data and even distribution of data across all the available AMPS.
- Scheduled the BTEQ scripts using Control-M tool. Develop shell scripts to get the collect stats from Control-M jobs.
- Created a cleanup process for removing all the Intermediate temp files that were used prior to the loading process. Streamlined the Teradata scripts and shell scripts migration process on the UNIX box
- Experience in providing complex queries to POWER-BI reporting team to generate the reports from SSAS cube.