We provide IT Staff Augmentation Services!

Data Science Intern Resume

Minneapolis, MN


Data Analytics Tools: R, Python, Spark, Rapid Miner, SQL, Tableau, Hadoop, Hive, GIS

Modeling Skills: Statistical & Predictive Modeling, Text Mining, Clustering, Simulation & Heuristic Modeling

Lab Analytics Skills: GCMS, DNA Extraction, Q - PCR, High-Through Sequencing, Cell Culture

Specialties: Classification, Regression, Exploratory Analysis, Visualization, Recommender Systems, Genetic Analysis


Confidential, Minneapolis, MN

Data Science Intern


  • Engaging with three different stakeholders to understanding their strategic needs and data generation process.
  • Analyzed 8000 Anti - epileptic drugs using patients' characteristic by preprocessing 18+million hospital and RX claim data
  • Identified the patient symptom history patterns that non-epileptics patient consumption of AED using association rules
  • Developing regression model to extract key factors impact patient claim number to improve personalized care and save cost
  • Partnered with three analysts to predict hospital-acquired-infections (HAI’s) in a regional hospital
  • Addressed sparse data through the use of dummy variables and features selection with PCA
  • Developed classification models with 8 different machine learning algorithms, Decision Tree, k-NN, Logistical Regression, Support Vector Machine, Neural Networks, Naive Bayes, Random Forest, Ensemble Model)
  • Built best model with 60% accuracy, 10% misclassification error to minimize financial impact and reduce patient risk.
  • Used review and product textual data of Amazon's clothing segment to develop a recommender system via rating prediction
  • Preprocessed sparse data by setting density threshold for user and item data, also used original sparse data using R
  • Used tri-gram to extract item content feature, and word entropy (0.8) to select feature for content base model using python
  • Compare collaborating, content based and hybrid model to find best model for rating prediction with 0.966 RMSE
  • Collaborated with a team of five analysts, serving as the subject matter expert and business analyst
  • Scrubbed 335 lakes and property tax panel data through outlier detection, inflation adjustment, and derived attributes
  • Described and visualized 35 years' water quality trends and property characteristics using R and ArcGIS
  • Analyzed interactive relationships between lake and property with mixed model, and refined it with external data
  • Presented analysis insights to 100 data scientists for stakeholders to develop community management strategy
  • Developed queries in Spark SQL to analyze meetup.com API data using both DataFrame and RDD based approaches
  • Built a SparkML pipeline and cost-sensitive classification models to predict MRSA & pneumonia patient readmissions
  • Performed data pre-processing and built both unsupervised (clustering) and supervised (regression) models in SparkR to assess and predict Minneapolis meetups
  • Led project team of five analysts to drive new marketing strategies by analyzing 3+million transaction data.
  • Analyzed 50 retail companies and segmented according to product market share for further promotion analysis.
  • Utilized temporal anomaly detection to determine which retailer is more competitive in terms of market share
  • Developed statistical models with average 90% accuracy to identify the short and long term effects of marketing share
  • Created executive level dashboard for client to gain the most efficient promotion combination and market positioning
  • Partnered with four analysts to predict hotel booking volume, price and length of stay in different granularities.
  • Retrieved and transformed 3million + row data from client database using SQL
  • Developed an auto regression predict model in weekly, monthly, quarterly levels with average 90%+ accuracy
  • Improved model with stock price, oil price and S&P data in regarding the industry component
  • Created an interactive dashboard in executive level to visualize model performance to clients
  • Partnered with four analysts to predict the likelihood of Airbnb user booking within 90 days after sign up
  • Explored the customer demographic data and online behavior data through data cleaning, data transform using Python
  • Analyzed and visualized the consumer behavior characteristic difference between customer user and non-customer user
  • Built a logistical regression model with 0.72 AUC using user demography, enrollment methods, online behavior attributes.
  • Improved the model by 0.13 AUC with Dow Jones industrial index and consumer sentiment index


Water Analyst & Research Associate


  • Collaborated with 21 cities to manage water quality and implemented 8 river remediation and evaluation policies
  • Collected, analyzed and reported water data of 124 manual sections and 28 automatic monitoring stations
  • Mapped hydrography using ArcGIS, presented biannual and annual report to agency director
  • Response for emergency water pollution event on call as data analyst and pollution source investigator
  • Developed water quality database for water remediation extension project by collaborating with technical team
  • Awarded 100K Yuan National Science Foundation to conduct bio-toxicity monitoring system application project
  • Developing bio-toxicity monitoring standard with 3 organisms monitoring system-fish, luminal bacteria, microbial fuel cell.
  • Assess the water quality of delta area with Biotic Ligand Model based on lab toxicity data and water quality parameters.
  • Predicted the bio-toxicity of river water with 24 physical and chemical parameters using ensemble model
  • Assisted in building drinking water precautionary system by analyzing risk with decision tree using SPSS and R

Confidential, Saint Paul, MN

Research Assistant


  • Supervised by Confidential soil scientist to study microbial mechanism associated greenhouse gas emission mitigation
  • Worked with lab technician to sampling soil and collecting CO 2 /N 2 O data from Rosemount Confidential corn field
  • Designed thesis experiments, conducted soil incubation in lab, tested CO 2 /N 2 O and physical-chemical parameters
  • Collaborated with Post-doctor to perform functional genes expression experiments with soil amendment.
  • Analyzed factors for correlation and performed visualization in R, incorporating results in published paper

Hire Now