We provide IT Staff Augmentation Services!

Senior Data Scientist Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Data scientist with extensive industry experience with big data, and advanced skills in statistics and machine learning
  • 10+ years of experience of programming and software development
  • Author of a publicly available R package (30000+ downloads to date)
  • Significant research experience as demonstrated by awards, publications and presentations

PROFESSIONAL EXPERIENCE:

Senior Data Scientist

Confidential

  • Spearheaded the effort to extend the capability of in - house A/B testing infrastructure to measure the impact of advertising on AppStore, and published reports post-launch
  • Defined success metrics for AppStore and Confidential Music search in live experiments
  • Led studies to relate user engagement and conversion with site speed performance, successfully living up to the challenge of connecting separate data sources for prompt delivery
  • Produced A/B test readouts to drive launch decisions for search algorithms including query refinement, topic modeling, signal boosting and machine-learned weights for ranking signals
  • Built automated pipelines to generate and incrementally update evaluation sets with metadata enrichment for human judgment of search quality
  • Designed tailored sampling methodology to build evaluation sets for different features
  • Designed and executed an evaluation plan to compare the search relevance of AppStore with competitors Technologies: R, SQL, Hadoop, Spark, Hive, Scala, Python, Splunk, Postgres, GitHub, Shell script

Staff Data Scientist

Confidential

  • Developed clustering methodology to personalize timing for sending push notifications, delivering significant engagement improvement (50%+ lift in click-through rate; 10% lift in mobile traffic)
  • Spot Award for pioneering experimentation of the notification service on the mobile platform, and for delivering significant improvement in user engagement
  • Developed an automated pipeline to monitor the quality of Confidential category recommendation engine by computing sellers’ acceptance rates in creating listings
  • Devised an auto-labeling strategy to create training data for classifying listings to product type, achieving the target precision goals (90%)
  • Deployed scalable, in-house machine-learning solutions in distributed systems to address high-value business questions
  • Improved lift estimation in A/B testing by means of mixture modeling coupled with bootstrapping
  • Sped up report generation by 80% on the experimentation platform, implementing in-database computation in R
  • Built a web app in R embedded with Teradata access to retrieve the activity log of any Confidential user

Technologies: R, SQL, Teradata, Hadoop, Hive, Scala, Scalding, Python, parallel computing, in-database analytics, JDBC/ODBC, BTEQ, GitHub, Shell script, C, SAS, Matlab, MicroStrategy

Postdoctoral Senior Fellow

Confidential

  • Developed supervised methodology based on Bayesian model averaging of regression models to construct gene regulatory networks from high-dimensional genetic data
  • Collaborated with a cross-disciplinary team of research scientists and offered them statistical consultation Publications:
  • K Lo, A Raftery, K Dombek, et al. (2012). Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Systems Biology.
  • KY Yeung, K Dombek, K Lo, et al. (2011). Construction of regulatory networks using expression time-series data of a genotyped population. Proceedings of the National Academy of Sciences of the USA.

Technologies: R, parallel computing, C, GNU Scientific Library, Perl, Matlab, LaTeX

Intern

Confidential

  • Developed an automated pipeline to identify discriminating features and to construct classifiers for cancer patients
  • Developed on-site computing facilities and configured the computing systems for the research team Publications:
  • A Bashashati, K Lo, R Gottardo, et al. (2009). A pipeline for automated analysis of flow cytometry data: preliminary results on lymphoma sub-type diagnosis. IEEE Engineering in Medicine and Biology Society.
  • K Lo, R Brinkman, R Gottardo (2008). Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A. (cited 100+ times)

We'd love your feedback!