- Five years of experience in machine learning algorithm design in academia and two years in industry
- Project experience in signal processing, text mining and image processing
- Expertise includes machine learning methods (SVM and Naïve Bayes, KNN, Decision Tree, Clustering, K - means, Feature Selection, PCA etc.…), natural language processing, data mining, statistical analysis (Linear Regression, Logistic Regression, Multivariate Regression), fast numerical algorithms(finite difference, finite element, Fast Fourier Transform, Wavelet Transform), large-scale simulations and parallel programming (OpenMP), High Performance Computing
- Expertise in Matlab, C++ and Python including ML libraries MLlib, SVMsilver, LibSVM
- Additional technical skills include: Mathematica, OpenMP, Linux, Latex, Word, Excel, PowerPoint, Mac OS, Windows
- SAS Base Certificate and SAS ADV Certificate
- Excellent interpersonal, communication and presentations skills, a strong team player
- Patent: X,Hu, Y.Fu and Y,Ge, China social media data collection and sentiment analysis, Confidential Motor Company, filed in Feb 2016, completed in Mar 2017.
- Word, Excel, PowerPoint
- MLlib, SVMsilver
- Unix Shell Script, Linux, Mac OS, Windows
- SQL, SAS
- Developed regression models for laser welding data and engine warranty data retrieved from Oracle Database via SQL using our objective functions and DataModeler package for Mathematica.
- Developed feature selection process and found the key process for laser welding and engine system. We are able to predict the defect products using the key features with accuracy over 90%.
- Wrote internal and external reports for the project
- Involved in social media data collection using web crawler and sentiment analysis on both English and Chinese customer reviews via machine learning methods (e.g. Support Vector Machine (SVM) and Naïve Bayes) and natural language processing techniques (e.g. bag of word, sentiment tag).
- Preprocessed the unstructured data (text from customer reviews) and translated it to structured data(numerical data) through natural language processing tools coding in Python. Constructed models using machine learning techniques, which can tell good review from bad automatically using MLlib package coded in Matlab using the preprocessed data.