Senior Data Scientist Resume
2.00/5 (Submit Your Rating)
SUMMARY:
- Data scientist with extensive industry experience with big data, and advanced skills in statistics and machine learning
- 10+ years of experience of programming and software development
- Author of a publicly available R package (30000+ downloads to date)
- Significant research experience as demonstrated by awards, publications and presentations
PROFESSIONAL EXPERIENCE:
Senior Data Scientist
Confidential
- Spearheaded the effort to extend the capability of in - house A/B testing infrastructure to measure the impact of advertising on AppStore, and published reports post-launch
- Defined success metrics for AppStore and Confidential Music search in live experiments
- Led studies to relate user engagement and conversion with site speed performance, successfully living up to the challenge of connecting separate data sources for prompt delivery
- Produced A/B test readouts to drive launch decisions for search algorithms including query refinement, topic modeling, signal boosting and machine-learned weights for ranking signals
- Built automated pipelines to generate and incrementally update evaluation sets with metadata enrichment for human judgment of search quality
- Designed tailored sampling methodology to build evaluation sets for different features
- Designed and executed an evaluation plan to compare the search relevance of AppStore with competitors Technologies: R, SQL, Hadoop, Spark, Hive, Scala, Python, Splunk, Postgres, GitHub, Shell script
Staff Data Scientist
Confidential
- Developed clustering methodology to personalize timing for sending push notifications, delivering significant engagement improvement (50%+ lift in click-through rate; 10% lift in mobile traffic)
- Spot Award for pioneering experimentation of the notification service on the mobile platform, and for delivering significant improvement in user engagement
- Developed an automated pipeline to monitor the quality of Confidential category recommendation engine by computing sellers’ acceptance rates in creating listings
- Devised an auto-labeling strategy to create training data for classifying listings to product type, achieving the target precision goals (90%)
- Deployed scalable, in-house machine-learning solutions in distributed systems to address high-value business questions
- Improved lift estimation in A/B testing by means of mixture modeling coupled with bootstrapping
- Sped up report generation by 80% on the experimentation platform, implementing in-database computation in R
- Built a web app in R embedded with Teradata access to retrieve the activity log of any Confidential user
Technologies: R, SQL, Teradata, Hadoop, Hive, Scala, Scalding, Python, parallel computing, in-database analytics, JDBC/ODBC, BTEQ, GitHub, Shell script, C, SAS, Matlab, MicroStrategy
Postdoctoral Senior Fellow
Confidential
- Developed supervised methodology based on Bayesian model averaging of regression models to construct gene regulatory networks from high-dimensional genetic data
- Collaborated with a cross-disciplinary team of research scientists and offered them statistical consultation Publications:
- K Lo, A Raftery, K Dombek, et al. (2012). Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Systems Biology.
- KY Yeung, K Dombek, K Lo, et al. (2011). Construction of regulatory networks using expression time-series data of a genotyped population. Proceedings of the National Academy of Sciences of the USA.
Technologies: R, parallel computing, C, GNU Scientific Library, Perl, Matlab, LaTeX
Intern
Confidential
- Developed an automated pipeline to identify discriminating features and to construct classifiers for cancer patients
- Developed on-site computing facilities and configured the computing systems for the research team Publications:
- A Bashashati, K Lo, R Gottardo, et al. (2009). A pipeline for automated analysis of flow cytometry data: preliminary results on lymphoma sub-type diagnosis. IEEE Engineering in Medicine and Biology Society.
- K Lo, R Brinkman, R Gottardo (2008). Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A. (cited 100+ times)