- Over 15 years of hands - on experience as a Data Engineer in IT Projects including Business Intelligence, Business Analytics, Data Cleansing, Data Mining, Data Quality, Master Data Management and Machine Learning
- Experienced in all phases of SDLC using RUP, Scrum and Agile Methodologies.
- Hands on experience in Functional Requirements Gathering, Data Exploration, Analysis, Profiling and Cleansing
- Expertise in SQL, R and Python for Machine Learning, Business Analytics and Data Cleansing Projects
- Knowledge of Statistics and experience using Statistical packages for analyzing datasets.
- Expertise in Predictive Modeling, Classification Modeling using Supervised and Unsupervised Machine Learning
Databases: SQL Server, Oracle 11g, Teradata, DB2, SAP HANA 1.0
Statistical Software Packages: SAS, MATLAB
R Programming: ggplot2, Shiny, dplyr, Survival
Python Programming: NumPy, Pandas, SciPy
SQL Programming: T-SQL, PL/SQL, Hive SQL
ETL Tools: SSIS, DTS, Informatica Power Center, SAP BODS, Talend
Apache: Hadoop, Hive, HBase, HDFS
Reporting: SSRS, Business Objects, Tableau
Confidential, Santa Clara, CA
- Collaborate with business teams to understand reporting requirements for Forex reports.
- Data exploration, Data Profiling and Data Quality assessment of Forex data using ad hoc SQL, Python and R plots.
- Develop and implement ETL process and Data cleansing rules to load Forex data from multiple sources into EDW.
- Data Analysis of Forex data using Machine Learning tools combined with R and Python libraries
- Developed fuzzy matching algorithms for de-duping Forex Account ids across multiple source systems
- Predictive Modeling of Forex Customer Churn using Decision Tree and implementation using R
Environment: Oracle 11g, SAP BODS, Tableau, RStudio, ggplot2, Pandas, NumPy, SciPyConfidential, San Rafael, CA
Master Data AnalystResponsibilities:
- Collaborate with business teams to understand issues with Vendor on-boarding process and P2P cycle
- Data exploration and Data Quality assessment of Vendor data in Oracle EBS using complex SQL and Analytic tools.
- Developed algorithms for Data Profiling, Data validation and Data cleansing of Vendor data
- Developed algorithm for Fuzzy Matching of Vendor records for de-duping data and creating a unique record set.
Environment: Oracle 11g, Oracle EBS R12, Informatica Data Quality, Collibra, Power BI, RStudio, ggplot2, SciPy, PandasConfidential, San Francisco, CA
Data Analytics ConsultantResponsibilities:
- Collaborate with LOB, ETL team and MDM team to define functional and technical requirements for Master Data
- Custom Asset Classification using Decision Tree Modeling with Python libraries
- Data Exploration, Design and development of Master Data Model for Securities
- Data Profiling, Data Analysis; identify and implement business rules to uniquely identify Securities.
- Design and configure Match Rules and Trust Rules to cleanse, standardize, match and merge Securities records
Environment: SQL Server, Erwin r9, SSIS, Informatica PowerCenter 9.6.1, IDQ, Informatica MDM, PythonConfidential, Oakland, CA
- Design and implement ETL interfaces for loading Member Eligibility data and Claims data.
- Source Data Exploration, Profiling, Data Analysis of Physician and Patient data from multiple sources.
- Medicare Fraud detection using supervised Machine Learning
- Implement Hierarchy and Affiliation relationships between Patients, their Household Members and Providers.
Environment: SQL Server 2012 R2, SSIS, Informatica PowerCenter 9.1, IDQ, Informatica MDM 9.5, Python and R librariesConfidential, San Francisco, CA
- Involved in gathering business requirements and analysis for various data feeds from third party Fund Index data providers including State Street, S&P Dow Jones, Russell and Blackrock
- Designed and developed of critical ETL processes.
- Developed a parameter driven ETL framework that contained dynamic configurations, custom logging and reporting.
Environment: SQL Server 2008 R2, SQL Server Integration Services (SSIS) 2008 R2, Teradata 13.10Confidential, San Francisco, CA
Data Integration EngineerResponsibilities:
- Participated in the Wachovia to Confidential Conversion Project as part of the Reporting and Analytics team.
- Data Profiling, Data Analysis, Data cleansing rules prior to conversion.
- Involved in the design, development and implementation of the Integrated Staging Area (ISA) used for reporting
- Involved in the design and execution of several other ad hoc SQL queries and canned reports for analysis.
- Worked on Performance tuning and optimization of ETL loads
Environment: SQL Server 2008 R2, Integration Services, Informatica PowerCenter 8.6, IDQ, Oracle 10g, Erwin v7.3.Confidential, NC
- Design and develop a custom Audit and Logging Framework for the Regulatory Reporting & Compliance Team
- Deployed Data Lineage to track data integration from operational systems to BI reports.
- Designed a metadata repository used to store technical and business metadata
Environment: SQL Server 2008 R2, SSIS, Oracle 9i, TFS 2012.Confidential, South San Francisco, CA
Master Data Developer AnalystResponsibilities:
- Capture Use cases for Physician Master Data as per Federal and State compliance.
- Identify multiple sources of Physician information and develop ETL for integration into MDM Hub.
- Data Profiling, Analysis and configuration for cleansing, standardizing Physician and Address data.
- Developed complex fuzzy matching algorithms for merging records and de-duping Physician data.
- Design and development of Master Data Model for Physicians Data and Predictive Modeling for new Physicians
Environment: SQL Server 2008 R2, Integration Services (SSIS), Microsoft Master Data Services (MDS) 2008 R2.Confidential, East Rutherford, NJ
- Performed Data Profiling and Data Analysis, creation of Cleansing Lists for Legacy source system data including Mainframe files and databases.
- Drafted Technical Design Specifications, ETL workflows and Source to Target Mappings.
- Developed ETL packages for loading data in Staging and Data Mart.
- Developed complex canned reports feeding from Underwriting and Claims systems
Environment: SQL Server 2005, SSRS 2005, SSIS 2005Confidential, Red Bank, NJ
- Migrated the legacy High Point systems and Interfaces to a Matrix architecture.
- Designed, developed and implemented Interfaces specifically for external Vendor Systems including ISO, NetMap
- Data Profiling and Analysis of third party Vendor data for integration into High Point systems.
- Defined Technical Design Specifications, ETL workflows and Source to Target Mapping (STTM) spreadsheets for each Vendor system interfacing with the High Point system.
- Designed, developed and implemented Interfaces specifically for external Vendor Systems