Dedicated analyst intern seeking for risk analyst position to employ strong statistical modeling techniques
and data manipulation skills.
- 5 years of hands-on experience in data mining and predictive modeling
- Strong understanding of data mining and modeling techniques including variable selection / construction, linear / logistic regression, factor / component analysis, decision trees, na ve bays classifier, nearest neighbor methods, cluster analysis and optimization methods
- Significant experience in relational database management system design and implementation
- Familiarity with SQL, SAS, S-plus, Matlab, Visual Basic.NET, C and Unix
- SAS Certified Base Programmer, SAS Certified Advanced Programmer
- Enthusiasm for analysis from start-to-finish, i.e., hypothesis formularization, data manipulation, modeling and validation
- Analyzed Large Sample Data for Mortality Analysis of State-Wide Children with Birth Defects.
- Used SAS extensively to merge information from multiple sources. Performed data cleansing and validation. Produced summary tables and charts, presented results to senior managers.
- Evaluated effects of demographics and diagnosis on children's mortality by SAS logistic regression.
- Explored Multi-Step Variable Screening for Logistic Regression.
- Predicted credit score with demographics and financial product usage information. Eliminated trivial variables, screened variables via univariable logistic regression, selected variables through bagging logistic regression. Reduced variable size from 100 to 40.
- Compared prediction power of logistic regression and L1-norm regularized logistic regression with cross-validation and AUC. L1 regularization was better than logistic regression when using entire variable set. Variable selection improved performance of logistic regression by 10 .
Considered 24 variables for predicting credit score. Constructed 1,000 bootstrap samples from dataset of 6,100 samples, conducted logistic regression with backward elimination, forward selection and stepwise selection. Compared agreement between different selection methods across 1,000 bootstrap samples. Over 12 variables were identified significant in less than 500 bootstrap samples.
Developed algorithms for image cluster analysis by using variables derived from both visual content and textual captions. Implemented probabilistic model based method to select variables for cluster analysis. Developed genetic algorithm based methods to combine similarities for cluster analysis. Increased image clustering accuracy by 30 than using textual or visual content alone.
Enhanced student status tracking and managing. Designed and created tables, queries and reports with Access and SQL server.
Maintained, updated and developed Client/Server based relational database system for Payroll Management. Acted as interim liaison between different developing teams. Conducted routine discussion to update interface documents through close collaboration between teams.
Worked closely with clients, analyzed business requirements to re-engineer customer management process. Designed and implemented relational database system for Customer Relation Management, trained clients in use of system.