We provide IT Staff Augmentation Services!

Data Analysis Internship | Statistical Data Collecting, Reorganizing And Processing Resume

5.00/5 (Submit Your Rating)

TECHNICAL SKILLS

  • Proficient in R
  • SPSS
  • Tableau
  • PostgreSQL
  • SQL
  • SAS
  • Excel
  • Access Have experience with Python
  • Java
  • Hadoop
  • Hive
  • Hbase
  • Spark
  • Linux consoler

PROFESSIONAL EXPERIENCE

Confidential

Data Analysis Internship | Statistical data collecting, reorganizing and processing

Responsibilities:

  • Design insurance scheme according to customer’s requirements, determined the premium amount according to client’s number and job nature.
  • Collected and reorganized claim data and made statistical judgments
  • Created database based on Access, applied statistical theory to analyze and improve the effectiveness of formulation process for group insurance

Confidential

Big Data Analytics: Expedia Hotel Recommendations

Responsibilities:

  • Used R and SparkR to build and operate the algorithm for processing a dataset over 4 gigabytes from Kaggle website
  • Built recommender model using user searching statistics by calculating the distance between searching and booking record which constructed by a value of combination of variables and their formed weight based on Random Forest
  • Achieved a high accuracy (70%) when giving top 10 likely stay hotel for users
Confidential

Database and SQL: A Database and Website for an Online Second - hand Book Store

Responsibilities:

  • Designed ER-diagram and related relational schema that allow users create accounts, sell/buy books, and add reviews etc.
  • Mapped our E/R diagram, including constraints, into a SQL schema using PostgreSQL on Microsoft Azure
  • Programed in python and flask to build a web application and used python package SQLAlchemy to connect the database
Confidential

Computational Statistics with R: Predictive Model for 8 th Grade Students’ Science IRT Score

Responsibilities:

  • Trained a model to predict students’ science IRT score among 28 chosen interested independent variables in four aspects
  • Attained a set of predictors based on different methods, namely subset selection, lasso, ridge and random forest
  • Compared models by cross-validation and obtained the optimal model for future prediction
Confidential

The Comparison of Five Classification Methods by Monte Carlo Simulation

Responsibilities:

  • Simulated data using Monte Carlo method in two different scenarios, namely linear and non-linear Bayes ’ boundary in R
  • Compared Logistic Regression, LDA, QDA, 1-NN and 10-NN using cross-validation by mean squared prediction error
Confidential

Multivariate Analysis: Effects of Sibling Size on Students’ Academic Achievement

Responsibilities:

  • Applied MANOVA method for comparing multivariate dependent academic performance variables of students with different sibling size, verified the null assumption of evenly performance of different groups of students
  • Utilized discriminant analysis to further search in which direction did the difference of different type of students lies on
Confidential

Data Mining: Algorithm Implementation for Different Methods in R

Responsibilities:

  • Analyzed face image data by pixmap library in R using PCA and KNN methods to retrieve the characteristics of each image
  • Implemented algorithms of Naive Bayes, CART and Logistic Regression to identify author utilizing the Federalist Papers Dataset

We'd love your feedback!