Data Scientist Resume
FL
SUMMARY
- Ph.D. in mathematics; 9+ years overall experience, 5+ years professional working experience in Data Science, Statistical/Mathematical Modeling and Simulation, Algorithms Developing.
- Expertise in problem solving. Excellent in working experienceinbusiness environment. Good atgathering modeling needs and converting them into technical solution anddata models.
- Deep understanding of Statistical Models and Multivariate Analysis. Familiar wifmodel selection, comparison,validationand testing.
- Experienced in Machine learning techniques: ANOVA, PCA, Forecasting, Time Series Regression,Linear/Nonlinear Regression, Logistics Regression, Clustering, and Tree based models.
- Experienced in working wif R and SAS for Statistical Analysis, along wif Teradata/SQL ServerforStored Procedures, Functions and Performance Analysis; Hands - on Python.
- Extensive experience infeaturecollection and selection. Good atdata transformation and normalization, variable analysis and visualization.
- Hands-on experience using Hadoop Hive for data management, expertise in dealing wif big datasets.
- Experience in Linux Shell Scripting. Familiar wif SSH secure shell and remote computing.
- Experienced in parallel computing methods to maximize calculation speed and efficiency, familiar wif LAPACK/BLAS.Knowledge in mathematical algorithm realization.
- Great interestin Data Science. Great interest in developing related skill sets and trying new technologies. Fast learner.
- Experience wif sensor data collection and analysis from experimental structures and civil infrastructures. Carried out failure model analysis, fissure development prediction and maximum load analysis.
- Teaching and training experience in: College Algebra, Trigonometry, Calculus me, II, III, ODE and Statistics. Patient teacher and good at interpreting materials to students for fast and easy understanding.
- Strong analytical, communication skills that helps in communicating wif business customers, development team and management.
- Strong multi-taskability and good time management skills. Energetic and enthusiastic for work. Ability to work under pressure wif high efficiency.
TECHNICAL SKILLS
Database Tools: SQL Server,Teradata, Hadoop
Data Modeling Tools: R, SQL, SAS, Matlab
Languages: R, SAS, Linux Scripting,, SQL, Matlab, Fortran, VBA
Packages: SSH, R Markdown, R studio, R parallel, SAS BI service.
Operating System: Windows, LINUX/Unix.
Others: MSOffice (Word, Excel, PowerPoint, Access), AutoCAD, Latex.
PROFESSIONAL EXPERIENCE
Confidential, FL
Data Scientist
Responsibilities:
- Modeling team production support. Maintain and run models for over 8 Contract, Retail, Print and Email campaigns. Got familiar wif teh database and models in teh process.
- Modeling & scoring automation. In charge of whole automation process. Worked wif DBMI, SAS IT, BI team and automated 2 mature modeling and scoring process. Built guardrail and monitoring reporting system to catch errors.
- Customer acquisition/attrition forecasting: collected customer acquisition and attrition data for forecasting analysis. Used exponential smoothing and tan ARIMA on teh first stage and collected internal project financial data to study teh impact of projects to acquisition and attrition. Time series regression model was used and identified several projects that has high correlation, reported for decision on financial distribution.
- Model review: Reviewed print campaign models, including input feature evaluation, data transformation, oversample/under-sample method, approach of teh problem (imbalanced data), sampling and segmenting, model selection process and model validation. Finished detailed report and proposed solution.
- Model enhancement: based on model review, added new data features based on new resources (demographic data), switched teh response variable to avoid imbalanced data problem. Also employed more modeling algorithms and added model comparison to achieve best prediction results.
- Imbalanced data modeling: tested for methods like segmenting, change of response variable and balanced random forest method to deal wif dis problem and lowered misclassification rate largely.
- Implemented R under current system circumstance (Linux and SAS server) for analysis together wif SAS for more flexible modeling.
- Data validation, attribute collection for customer demographic and online category.
- Environment: SAS Enterprise Guide, EMiner, R, Teradata, AWS, BI ETL, Python.
Confidential, MN
Data Scientist
Responsibilities:
- Worked oncommercial data analysis, including data collection, table treatment, database information, problem analysis, solution proposal and implementation.
- Familiarized wif database wifin weeks. Proposed multiple solutions to deal wif sales data analysis and methods to calculate baseline and promotion incrementalsales.
- Responsible for data collection, cleansing andANOVA. Designed technical solution roadmap to deal wif noise in sales data.
- Finished multiple linear or nonlinear regression models for testing case and generated very good predicting results in scenario tests. Developed automation process forvariable and model selection on over 5,000 core SKU's.
- Worked on Noise Reduction methods - exponential smoothingand Fast Fourier Transformationmethods - and made comparison wif regression methods. Upgraded old methods to double/triple exponential smoothing and achieved better results.
- Utilized Decision Tree and Support Vector Machine on sales data and carried out clustering analysis on promotional behaviors of different SKU's. Achieved good prediction for different promotion media on various products.
Environment: R, R Parallel, Teradata, SAS, Python.
Confidential
Research Assistant
Responsibilities:
- Worked on datacollection,munging and analysis& accessed rawdata in varied formats wif different methods for analyzing and processing.
- Responsible in data pattern recognition and data cleaning. Identify missing, invalid values and outliers, analyze and categorize variables of datasets.
- Checked data for distribution, classification, correlation and VIF etc, variable selection to get deep understanding of datasets.
- Carried out Regression Analysis wif R/SAS, investigated on teh model for problems like goodness of fit, over-fitting, Multicollinearity, residual normality, etc. Established our linear model for forecasting.
- Worked on TEMPprincipal component analysis and logistic regression analysis on datasets. Involved in variable selection, model optimization, comparing and validation.
- Involved in trend analysis and segmentation analysis based on 12 years of monthly-based market data. Provided graphic analysis report to illustrate market trend and object segment behavior.
- Carried out ANOVA wif R to keep track of variables of turbulent flow and analyze influence of dimension and position on flow parameters.
- Employed Unix Shell Scripting and Parallel Computing for R and Fortran on Texas Advanced Computing Center(TACC) to guarantee efficient and successful computation of our case.
- Familiar wif computational numerical algorithms. Experienced in advanced high-speed computing and efficiency improvement.
- Responsible for data visualization in both graphical form and motion pictures to analyze topology and support future research.
- Preparing model instructions and detailed report for Bureau check, including explaining variables and scenario tests of predictive models.
Environment: R, SQL, Remote Linux Server, Redhat, Fortran, Matlab, VBA, LaTex.
Confidential, Richfield, MN
Data Scientist Intern
Responsibilities:
- Used and supported database applications and tools for extraction, transformation and analysis of rawdata.
- Worked wif end users to gain an understanding of information and core data concepts behind their business.
- Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements.
- Worked on DataQuality Controls and Business Requirements Document. Attended project release meetings.
- Created datamasking mappings to mask teh sensitive data between production and test environment.
- Identify source systems, their connectivity, related tables and fields and ensure data suitably for analysis.
- Involved in extensive datavalidation wif several complex SQL queries and involved in Back-EndTesting and worked wif data quality issues.
- Worked on logistic regression analysis on credit datasets. Built logistic models to support Business department in determining credit issue.
- Developed, managed and validated existing data models including logical and physical models of teh data warehouse and source systems utilizing SAS.
- Wrote simple and advanced SQLqueries and scripts to create standard reports for senior managers.
Environment: SQL Server 2008, R, SAS, T-SQL, Windows Server 2008