- Seeking for a full time position as a SAS Programmer or data analyst.
- Familiar SAS 9.3 procedure: BASE/STAT/SQL/MACRO, proc anova, proc genmod, proc cluster, proc distance, proc genmod, proc glm, proc freq, proc factor, proc gam, proc lifereg, proc lifetest, proc reg, proc mixed, proc princomp, proc logistic.
Computer and Programming Language: SAS (BASE/STAT/SQL/MACRO), R, Eviews, C Language, PASCAL, Mathematica, ITSM2000, MS Excel VBA.
Statistical and numerical method: Econometrics, Data Mining, Generalized Linear Models, Survival Analysis, Monte Carlo Simulation, and Optimization
Confidential, Cincinnati, OH
- Segment customers into groups based on their protaintial probability of loss counts.
- Assign different risk fator to each group of customers, make it convenient for Confidential to set up different price strategies for each group.
- Clean the raw data with SAS/SQL
- Due to the mistaken values in some variables, select the midel 90% data of those variables.
- Create the target variable "sum" by adding up the total claim counts for each customer.
- Employed the classification tree in SAS Enterprise Miner to select relevent independent variables for the target variable.
- Selected 5 independent variables corresponding to the target variable "sum" from the total 57 associated variables.
- Developed Zero - inflated negative binomial regression model to predict each customer's potaintial loss.
- Firstly considered generalized linear models which can deal with event counts dependent variable: Poisson and Neg-binomial
- Considered the fact that customers may or may not claim their loss to insurance company, the zero-inflated regression model was introduced into GLM models.
- Employed SAS/STAT to estimated the parameters of each model, tested models by Voung test and Pearson Statistic, selected the zero-inflated neg-binomial regression model as the best one.
- Used the final model to predict the expectated loss counts and claim counts for each customer.
- Use the clustering analysis in SAS Enterprise Miner to segment all customers into groups.
- Clustered customers into 40 groups by predicted claim counts and loss counts using least square method.
- Sort the 40 customer groups by cluster's mean predicted claim counts and assigned each group with a risk factor. proc import, proc mean, proc freq, proc genmod, proc sgplot, proc transpose, macro.