Data Analysis Internship | Statistical Data Collecting, Reorganizing And Processing Resume
5.00/5 (Submit Your Rating)
TECHNICAL SKILLS
- Proficient in R
- SPSS
- Tableau
- PostgreSQL
- SQL
- SAS
- Excel
- Access Have experience with Python
- Java
- Hadoop
- Hive
- Hbase
- Spark
- Linux consoler
PROFESSIONAL EXPERIENCE
Confidential
Data Analysis Internship | Statistical data collecting, reorganizing and processing
Responsibilities:
- Design insurance scheme according to customer’s requirements, determined the premium amount according to client’s number and job nature.
- Collected and reorganized claim data and made statistical judgments
- Created database based on Access, applied statistical theory to analyze and improve the effectiveness of formulation process for group insurance
Confidential
Big Data Analytics: Expedia Hotel Recommendations
Responsibilities:
- Used R and SparkR to build and operate the algorithm for processing a dataset over 4 gigabytes from Kaggle website
- Built recommender model using user searching statistics by calculating the distance between searching and booking record which constructed by a value of combination of variables and their formed weight based on Random Forest
- Achieved a high accuracy (70%) when giving top 10 likely stay hotel for users
Database and SQL: A Database and Website for an Online Second - hand Book Store
Responsibilities:
- Designed ER-diagram and related relational schema that allow users create accounts, sell/buy books, and add reviews etc.
- Mapped our E/R diagram, including constraints, into a SQL schema using PostgreSQL on Microsoft Azure
- Programed in python and flask to build a web application and used python package SQLAlchemy to connect the database
Computational Statistics with R: Predictive Model for 8 th Grade Students’ Science IRT Score
Responsibilities:
- Trained a model to predict students’ science IRT score among 28 chosen interested independent variables in four aspects
- Attained a set of predictors based on different methods, namely subset selection, lasso, ridge and random forest
- Compared models by cross-validation and obtained the optimal model for future prediction
The Comparison of Five Classification Methods by Monte Carlo Simulation
Responsibilities:
- Simulated data using Monte Carlo method in two different scenarios, namely linear and non-linear Bayes ’ boundary in R
- Compared Logistic Regression, LDA, QDA, 1-NN and 10-NN using cross-validation by mean squared prediction error
Multivariate Analysis: Effects of Sibling Size on Students’ Academic Achievement
Responsibilities:
- Applied MANOVA method for comparing multivariate dependent academic performance variables of students with different sibling size, verified the null assumption of evenly performance of different groups of students
- Utilized discriminant analysis to further search in which direction did the difference of different type of students lies on
Data Mining: Algorithm Implementation for Different Methods in R
Responsibilities:
- Analyzed face image data by pixmap library in R using PCA and KNN methods to retrieve the characteristics of each image
- Implemented algorithms of Naive Bayes, CART and Logistic Regression to identify author utilizing the Federalist Papers Dataset