We provide IT Staff Augmentation Services!

Lead Data Scientist Resume

Houston, TX

SUMMARY:

I’m a passionate data scientist with solid training and profound experience of data processing, analysis, modeling and data pipelines. I am currently a lead data scientist at Confidential and have gained extensive experience with project management and communicating with clients. I’m familiar with big data frameworks, ML/DL packages and statistical packages, including aws, spark/Hadoop, python, R, sql, keras, tensorflow, stata, perl.

EXPERIENCE:

Lead Data Scientist

Confidential, Houston, TX

Responsibilities:

  • Managed and worked on multiple projects
  • Build deep learning models for a large - scale plant growth monitoring system
  • Develop novel statistical models for prioritizing genome editing targets Genomic analysis of newly sequenced organism genomes (Genome assembly, gene annotation, RNA seq pipeline)
  • Develop deep learning models for prioritization of CRISPR targets
  • Manage the MySQL databases/AWS of Confidential (partial work)
  • Develop big data pipelines and predictive models for terabyte of retail sales data

Postdoctoral Associate

Confidential, Houston, TX

Responsibilities:

  • Built neural network model to prioritize genes and mutations through high-dimensional data integration
  • Designed and developed convolutional neural network model for diagnosis of hematology images

Research Assistant

Confidential, Los Angeles, CA

Responsibilities:

  • Performed data mining of the HealthFacts database, which is a collection of clinical record's of >47M unique patients
  • Explored patterns of sequential medications in cancer patient prognosis using logistic regression and survival analysis

Assistant Database Programmer

Confidential, Los Angeles, CA

Responsibilities:

  • Maintained and upgraded PANTHER database which contains 1.38 million genes from 103 genomes

Research Assistant

Confidential, Los Angeles, CA

Responsibilities:

  • Unprecedentedly reconstructed the evolutionary history of gene gains and losses since the earliest form of life
  • Resolved the 2R WGD hypothesis, which has puzzled the science community for more than 40 years
  • Built the general statistical frame work for annotating genes with functions by simulating human curator’s behavior
  • Tackled the frame work above using logistic regression and cooperated with other PI using Markov chain Monte Carlo
  • Constructed pipeline for grafting query gene sequence to the best position in the matching Hidden Markov model

Hire Now