Professional Experience Summary
- 30+ years of delivering solutions using C, C++, Java, R, and Python.
- Over 60 Papers and Presentations in Physics, Coding Techniques, Distributed Computing, Artificial Intelligence, and Machine Learning.
- Developed production - grade (fast) text pre-processing for mis-spelling, emoji translation, contraction substitution abbreviation substitution, etc. resulting in lemmatization for NLP models multicore and GPU. All work resulted in benchmark testing. (Still on-gong).
- Developed fast pre-processing, unsupervised models (clustering). and supervised models to 1.0.
- Code refactoring to add logging, parameterization, and lower technical debt. Resulted in articles
- Designed, developed, unit-testing, integration testing, and documented Paso. Paso is an open-source package of function and class methods operations for the Machine Learning pipeline: data input, data cleaning, imputation, augmentation, encoding, scaling, learner, hyper-parameter tuning, cross-validation. The latest release (0.5.2) of Paso can be obtained at github.com/bcottman/paso.
- Develop a framework for Machine Learning data encoders, learners, hyper-parameter tuning, cross-validation i.e. the entire pipeline ( except ensemble) using metadata encapsulated in description files. dis enables a user to run different configurations without having to learn Python. Resulted in articles: Paso’s Offering of Logging and Parameter Services for you're Python Project.; Balancing and Augmenting Structured Data;Uncommon Data Cleaners for you're Real-World Machine or Deep Learning Project; More Uncommon Data Cleaners for you're Machine or Deep Learning Project.
- Design and develop a new encoding algorithm that improved and replaced their OHE. Significantly reduced dimensionality and removed most of the manual feature engineering burden by exposing coupling (also non as latent) factors between features. dis had the unexpected feature of stabilizing predictions from the incoming stream of new data.
- Various Kaggle solutions in image classification, object classification, object segmentation, and time-series structured data mainly in diverse areas ranging from satellite imagery to medical imagery. Built Ubuntu 14.04, 16.04 + Nvidia-card machines.
Python, C++, TensorFlow
- using numpy, Pandas, sklearn, skimage, pycluster, numba, GraphQL, xgboost, Keras, pytorch, CV, pillow, SHAP, dash, PySpark, clusim, and PyCharm; to name a few of the 50+ packages in the above projects.
- (contract) Designed and developed R-based machine-learning for the analysis of Real Estate market data of Alachua County. Saw no significant difference in various metrics using 6 different learners. Increased response rate from 0.2% to 1.2% (based on approximately 5,000 mailings distributed over control (1,000) and targeted potential customers (4,000).
- Self-study of probability theory, statistics, iOS, Machine Learning, cellular biology, neurology, accounting, Spanish, programming methodology at MIT CourseWare, Stanford on-online, EdX, and Coursera.