Watson Health Data Scientist Resume
Cambridge, MA
SUMMARY
- Extensive experience in Biovia Pipeline Pilot - development and support
- Computational chemistry - drug discovery - structural modeling, VLS, binding, pharmacophores, QSAR, QSPR, fragment-based design, combinatorial libraries
- Bioinformatics - sequences, pathways, gene expression, omics, biomarkers, targets, MoA
- Support of applied projects and method development/programming
- Broad knowledge of drug discovery and development
- Statistics. Analytics. Predictive and prescriptive modeling. Precision medicine. Machine learning
- Data science and informatics - web interfaces, databases, data processing/analytics - design, implementation/coding, support, cloud computing. SQL, MongoDB, Python, R, Perl, C++, Java, Unix.
- Diverse data experience - clinical, EHR, drug discovery, translational, toxicology, pharmacology, physical, chemical, molecular structure, genomics, proteomics, microarray, business, financial, economic data - capture, mining, analysis, integration, annotation, modeling and visualization
- Management experience - agile project management, matrix environment and line management
- Business development experience
- Excellent communication, presentation and people skills
- Creativity and sense of innovation
- Extensive patent and publication lists (mostly last or first author) in top journals
TECHNICAL SKILLS
Watson Health Data Scientist
Confidential, Cambridge, MA
Responsibilities:
- Statistics, predictive and prescriptive modeling - institutional readmissions, risk of hospitalization, social determinants of health
- Prediction of hypoglycemic events from time-series data - artificial pancreas project
- Real world healthcare data - claims, EHR/EMR, SNOMED, ICD-10, medical devices (real time data streams).
- Development of innovative algorithms for dimensionality reduction and model-based feature selection on binary, continuous and categorical data.
- Coding (SQL, Python, Java, R and SPL) of software modules for ETL, calculation of features and modeling algorithms
- Agile project management
- Evaluation of effectiveness of healthcare payment models alternative to fee-for-service model.
- Principal Research Investigator, Confidential, Waltham, MA .
- Project management and hands-on development in support of drug discovery/development through multidimensional data mining, analysis, visualization, modeling and simulation.
- Informatic support of mass spectra-based metabolomics and biomarker research. Design and implementation of a data warehouse for MS data and a database for metabolites.
- Development of web interfaces, data processing pipelines and visualization tools for uploading diverse assay data into relational and NoSQL databases.
- Support of genetics and NGS. Integration of software for large-scale population genetic analyses (Hardy-Weinberg, directional selection, haplotype frequencies, linkage disequilibrium).
- Pipeline Pilot, PilotScript, SQL, NoSQL, Oracle, MongoDB, HTML/CSS/Javascript, MATLAB, R, Python, Java, TIBCO Spotfire, cloud computing
- Design of web interfaces and web-based data retrieval/mining/processing pipelines, visualization tools. Relational databases: design of queries and architecture, maintenance. Coding of both front- and back-end. Biovia Pipeline Pilot, PilotScript, SQL, Oracle, Javascript, Tibco Spotfire.
Scientific Computing Contractor
Confidential, Bethesda, MD
- Application development in Biovia PipeLine Pilot
- Statistical/modeling/computing support for gene expression analysis, biomarker identification, mechanism of action, analysis of genomic databases and data mining.
- Machine learning modeling (PCA, PLS, neural networks) of gene expression and QSAR - insights into MOA, putative targets and biomarkers.
- Developed a structural bioinformatic tool capable of searching PDB for particular 3D features
- Implemented web interface for compound design and a web-based visualizer of activity cliffs.
- Designed a fragment-like libraries and implementation of drug- and fragment-likeness filters.
- Molecular modeling - Androgen receptor, Nrf2-Keap1 (IVR domain and of its Zn-mediated dimer), C-Myc.
- Kinase projects (Mnk, Akt, EGFR, BRAF). Identified leads by structure-based design and VLS.
- Developed 3 proposals on dynamic model of hERG and discovery allosteric kinase inhibitors.
- Integration of pre-clinical data into company database. PipeLine Pilot, PilotScript, Perl, UNIX.
Principal Investigator
Confidential, Woburn, MA
Responsibilities:
- OUTLINE. Modeling and computational chemistry support of discovery and development operations. Bioinformatics, chemoinformatics. Machine learning, statistical and structural modeling, databases. Support of projects and development of bio- chemo-informatic tools. Extensive experience on all stages from hit discovery to IND.
- Co-developed Confidential technology platform for fragment-based discovery of allosteric kinase inhibitors - both computational part and biophysical characterization.
- Developed sequence analysis tools for identification of proteins amenable to Confidential ’s AKIP technology.
- Designed hundreds of combinatorial libraries both for Pfizer and internal discovery
- Developed a ultrafast “surrogate” docking algorithm - combination of docking and machine learning.
- Developed ADMET, serum binding and bioavailability models (machine learning and PLS).
- Developed a number of Daylight- and PipeLine Pilot-based cheminformatic applications.
- Refined reagent-level and product-level diversity management methods.
- Significantly augmented reactivity and toxicity filters for design of compound libraries.
- Developed a model for prediction of reagent reactivity.
- Developed QSAR, COMFA/COMSIA and pharmacophore models - inhibitors of proteases and kinases.
- Hit discovery: developed VLS protocol with 15% to 25% experimental hit rate (low µM IC50).
- Lead optimization- drove affinity of several compound series (kinase targets) down to sub-nM range.
- Developed design method based on the use of protein motifs known to bind certain fragments.
- Developed structural model of hERG that explains most cases of hERG blockade modulation.
- Developed onco- and kinase-likeness scoring methods based on machine-learning algorithms.
- Completed numerous projects in protein modeling: homology, loop modeling, large-scale optimization, kinase activation, domain rearrangement.
- Developed a computational method for identification of proteins for selectivity assays.
- Experimental: developed a thermal shift fluorimetry assay for binding and protein stability
- Management: one Ph.D. report
Senior Scientist II
Confidential, San Diego, CA
Responsibilities:
- Computational drug design, bioinformatics
- Developed a method for identification and clustering of kinase sequences in human genome.
- HT virtual screening of compound libraries and lead identification.
- Developed a novel affinity scoring function, which significantly improved affinity prediction.
- Developed a novel algorithm for solubility prediction, which resulted in a patent.
- Improved clustering of databases of vendor compounds.
Confidential, Pasadena, CA
Responsibilities:
- Computational protein design, bioinformatics
- Developed a method for identification of distant protein sequence similarity by using databases of computationally designed sequences compatible with a given fold.
- Computational design of therapeutic proteins, e.g. HGH and TNF alpha variants.
- Developed a novel empirical free energy function accounting for backbone and side chain conformational entropy. Optimized the electrostatic and polar hydrogen burial terms.
- Organized joined projects with EntreMed and Northwestern University (anti-angiogenesis).
- Management: team leader on the Human Growth Hormone project
Confidential, San Francisco
Responsibilities:
- Computational drug design
- Developed a novel hybrid docking method: rigid-body docking for rough filtering, followed by flexible docking using multiple-start Monte-Carlo optimization in internal coordinates.
- Developed a novel scoring function, which included, in particular, Poisson-Boltzman electrostatics, a desolvation term, a hydrophobicity term and a torsional conformational entropy term.
- Using the designed docking method identified 8 ligands for TAR RNA bulge structure, 3 of them (37% hit rate) were proven to be low micromolar binders. Designed ligands for HIV-1 TAT protein.
- Identified the opportunity and organized collaboration with Prof. Thomas Bell on arginine ligands - resulted in a publication.
TECHNICAL SKILLS
Modeling/Numerical methods: multidimensional data mining, PLS, neural networks, decision trees, Bayesian statistics, PCA, machine learning, experiment design, docking (high throughput and flexible-receptor), free energy functions, affinity prediction, solubility and ADMET prediction, QSAR, QSPR, pharmacophore modeling, COMFA/COMSIA, virtual screening, design of combinatorial libraries, diversity management, small molecule clustering, homology modeling, molecular dynamics, Monte Carlo, protein design, loop and large rearrangements in proteins, protein sequence alignment, distant protein sequence similarity, affinity prediction, FEP.
Application software: ICM, Accelrys, Shroedinger, Sybyl-Tripos, DOCK, DockIt, MOE/SVL (CCG), FlexX, Glod, Glide, Daylight toolkit, ROCS, OpenEye, QuickProp, SIMCA, PipeLine Pilot, PDA, Blast, Spotfire
Programming: SQL, Python, R, MongoDB, Javascript, Java, HTML, Perl, CGI, PipeLine Pilot/PilotScript, C/C++, Fortran, Basic, Bioconductor, TCL/TK, UNIX shells, ICM, CCL-SVL/MOE, SGI system administration, HPC environment, development of web-based user interfaces
NMR related: running Varian VXR500 and Bruker AMX600, obtaining various spectra, including 3D isotope-edited, assignment of protein resonances on 2D and 3D data, protein structure determination and refinement.
Other theoretical: PBSA, enzymatic and chemical kinetics, diffusion-controlled reactions, analytical and numerical solution of partial differential and integral equations, statistics.
Analytical methods: differential scanning fluorimetry, titration microcalorimetry, NMR (including tritium NMR), UV/VIS spectrophotometry, CD spectroscopy, all types of electrophoresis and chromatography, mass spectroscopy and MS/GC.
Chemistry: design of synthetic schemes based on high-yield chemistries amenable to robotization, synthesis of organic compounds and peptides, tritium labeling of compounds using solid state catalysis, enzymatic synthesis of tritiated components of nucleic acids.
Other experimental: study of mechanism of action of enzymes using kinetic isotope effects, purification and characterization of proteins.
