We provide IT Staff Augmentation Services!

Data Scientist  Resume

Santa Clara, Ca


  • Over 15 years of hands - on experience as a Data Engineer in IT Projects including Business Intelligence, Business Analytics, Data Cleansing, Data Mining, Data Quality, Master Data Management and Machine Learning
  • Experienced in all phases of SDLC using RUP, Scrum and Agile Methodologies.
  • Hands on experience in Functional Requirements Gathering, Data Exploration, Analysis, Profiling and Cleansing
  • Expertise in SQL, R and Python for Machine Learning, Business Analytics and Data Cleansing Projects
  • Knowledge of Statistics and experience using Statistical packages for analyzing datasets.
  • Expertise in Predictive Modeling, Classification Modeling using Supervised and Unsupervised Machine Learning


Databases: SQL Server, Oracle 11g, Teradata, DB2, SAP HANA 1.0

Statistical Software Packages: SAS, MATLAB

R Programming: ggplot2, Shiny, dplyr, Survival

Python Programming: NumPy, Pandas, SciPy

SQL Programming: T-SQL, PL/SQL, Hive SQL

ETL Tools: SSIS, DTS, Informatica Power Center, SAP BODS, Talend

Apache: Hadoop, Hive, HBase, HDFS

Reporting: SSRS, Business Objects, Tableau


Confidential, Santa Clara, CA

Data Scientist

  • Collaborate with business teams to understand reporting requirements for Forex reports.
  • Data exploration, Data Profiling and Data Quality assessment of Forex data using ad hoc SQL, Python and R plots.
  • Develop and implement ETL process and Data cleansing rules to load Forex data from multiple sources into EDW.
  • Data Analysis of Forex data using Machine Learning tools combined with R and Python libraries
  • Developed fuzzy matching algorithms for de-duping Forex Account ids across multiple source systems
  • Predictive Modeling of Forex Customer Churn using Decision Tree and implementation using R

Environment: Oracle 11g, SAP BODS, Tableau, RStudio, ggplot2, Pandas, NumPy, SciPy

Confidential, San Rafael, CA

Master Data Analyst

  • Collaborate with business teams to understand issues with Vendor on-boarding process and P2P cycle
  • Data exploration and Data Quality assessment of Vendor data in Oracle EBS using complex SQL and Analytic tools.
  • Developed algorithms for Data Profiling, Data validation and Data cleansing of Vendor data
  • Developed algorithm for Fuzzy Matching of Vendor records for de-duping data and creating a unique record set.

Environment: Oracle 11g, Oracle EBS R12, Informatica Data Quality, Collibra, Power BI, RStudio, ggplot2, SciPy, Pandas

Confidential, San Francisco, CA

Data Analytics Consultant

  • Collaborate with LOB, ETL team and MDM team to define functional and technical requirements for Master Data
  • Custom Asset Classification using Decision Tree Modeling with Python libraries
  • Data Exploration, Design and development of Master Data Model for Securities
  • Data Profiling, Data Analysis; identify and implement business rules to uniquely identify Securities.
  • Design and configure Match Rules and Trust Rules to cleanse, standardize, match and merge Securities records

Environment: SQL Server, Erwin r9, SSIS, Informatica PowerCenter 9.6.1, IDQ, Informatica MDM, Python

Confidential, Oakland, CA

Data Engineer

  • Design and implement ETL interfaces for loading Member Eligibility data and Claims data.
  • Source Data Exploration, Profiling, Data Analysis of Physician and Patient data from multiple sources.
  • Medicare Fraud detection using supervised Machine Learning
  • Implement Hierarchy and Affiliation relationships between Patients, their Household Members and Providers.

Environment: SQL Server 2012 R2, SSIS, Informatica PowerCenter 9.1, IDQ, Informatica MDM 9.5, Python and R libraries

Confidential, San Francisco, CA

Data Engineer

  • Involved in gathering business requirements and analysis for various data feeds from third party Fund Index data providers including State Street, S&P Dow Jones, Russell and Blackrock
  • Designed and developed of critical ETL processes.
  • Developed a parameter driven ETL framework that contained dynamic configurations, custom logging and reporting.

Environment: SQL Server 2008 R2, SQL Server Integration Services (SSIS) 2008 R2, Teradata 13.10

Confidential, San Francisco, CA

Data Integration Engineer

  • Participated in the Wachovia to Confidential Conversion Project as part of the Reporting and Analytics team.
  • Data Profiling, Data Analysis, Data cleansing rules prior to conversion.
  • Involved in the design, development and implementation of the Integrated Staging Area (ISA) used for reporting
  • Involved in the design and execution of several other ad hoc SQL queries and canned reports for analysis.
  • Worked on Performance tuning and optimization of ETL loads

Environment: SQL Server 2008 R2, Integration Services, Informatica PowerCenter 8.6, IDQ, Oracle 10g, Erwin v7.3.

Confidential, NC

ETL Developer

  • Design and develop a custom Audit and Logging Framework for the Regulatory Reporting & Compliance Team
  • Deployed Data Lineage to track data integration from operational systems to BI reports.
  • Designed a metadata repository used to store technical and business metadata

Environment: SQL Server 2008 R2, SSIS, Oracle 9i, TFS 2012.

Confidential, South San Francisco, CA

Master Data Developer Analyst

  • Capture Use cases for Physician Master Data as per Federal and State compliance.
  • Identify multiple sources of Physician information and develop ETL for integration into MDM Hub.
  • Data Profiling, Analysis and configuration for cleansing, standardizing Physician and Address data.
  • Developed complex fuzzy matching algorithms for merging records and de-duping Physician data.
  • Design and development of Master Data Model for Physicians Data and Predictive Modeling for new Physicians

Environment: SQL Server 2008 R2, Integration Services (SSIS), Microsoft Master Data Services (MDS) 2008 R2.

Confidential, East Rutherford, NJ

ETL Developer

  • Performed Data Profiling and Data Analysis, creation of Cleansing Lists for Legacy source system data including Mainframe files and databases.
  • Drafted Technical Design Specifications, ETL workflows and Source to Target Mappings.
  • Developed ETL packages for loading data in Staging and Data Mart.
  • Developed complex canned reports feeding from Underwriting and Claims systems

Environment: SQL Server 2005, SSRS 2005, SSIS 2005

Confidential, Red Bank, NJ

Database Developer


  • Migrated the legacy High Point systems and Interfaces to a Matrix architecture.
  • Designed, developed and implemented Interfaces specifically for external Vendor Systems including ISO, NetMap
  • Data Profiling and Analysis of third party Vendor data for integration into High Point systems.
  • Defined Technical Design Specifications, ETL workflows and Source to Target Mapping (STTM) spreadsheets for each Vendor system interfacing with the High Point system.
  • Designed, developed and implemented Interfaces specifically for external Vendor Systems
Environment: IBM DB2 UDB, IBM Control Center, SQL Server 2000, Microsoft DTS, SSIS 2005

Hire Now