- 8 years of experience in Data Science, Machine Learning, Artificial Intelligence and Neural Networks
- 8 years of experience in Information Technology
- 10 years + in Python Programming
- Over 8 years of experience in Machine Learning, Deep Learning, Data Science, Data Mining with large datasets of Structured and Unstructured Data, Data Acquisition, Data Validation, and Predictive Modeling.
- Able to research statistical machine learning and data science methods which include forecasting, supervised learning, unsupervised learning, classification, survival analysis, computer vision, natural language processing (NLP), and Bayesian methods.
- Experience with AWS cloud computing, Spark (especially AWS EC2, Lambda, EMR).
- Experience in the healthcare domain.
- Strong technical communication skills; both written and verbal.
- Ability to understand and articulate the “big picture” and simplify complex ideas.
- Strong problem solving and structuring skills.
- Ability to identify and learn applicable new techniques independently as needed.
- Ability to create new solutions through a combination of foundational research and collaboration with ongoing initiatives.
- Experience formulating and solving discrete and continuous optimization problems.
- Expertise with design optimization methods with computational efficiency considerations.
- Conduct complex, advanced research projects in areas of interest to Business Units.
- Develop new and advanced cutting - edge techniques and algorithms
- Transfer and implement results and technology in hard- and software prototypes and demo systems relevant to the businesses.
- Survey relevant technologies and stay abreast of latest developments
- Draft and submit papers and patents based on research
- Contribution to several research projects that combine new data sources and computational tools
- Capable of writing efficient code and working with large datasets
- Exceptional mathematical and statistical modeling and computer programming skills
- Use of mathematical and statistical modeling and computer programming skills in an innovative manner.
- Effectively worked within an interdisciplinary research environment.
- Capable of advanced technical sophistication of solutions using machine learning and other advanced technologies.
- Experience leading offshore (distributed) development teams using Agile Scrum methodology.
- Skilled in developing data science solutions in AWS EC2, Microsoft Azure/Data Factory/Databricks/DevOps data engineering pipelines, and IPython/Jupyter Notebook workflows.
Programming: Python, R, C/C++, Java, SQL, Spark, Shell
Python Packages: Numpy, Pandas, Scikit-Learn, TensorFlow, PyTorch, SciPy, Matplotlib, Seaborn, NLTK, PySpark, XGBoost, SQLAlchemy, Selenium, Keras, PyMongo
IDE: Jupyter, Spyder, PyCharm, RStudio, Qt Creator, Visual Studio
Version Control: GitHub, Git, BitBucket
Machine Learning Applications: Natural Language Processing & Understanding, Sentiment Analysis, Computer Vision, Time series Analysis and Forecasting, Survival Analysis, Classification, Regression, Recommender Systems, Customer Segmentation
Data Query: Microsoft Azure, Google Cloud Platform, Amazon RDS, EMR; Hadoop Hive, HDFS, RDBMS, SQL, noSQL, MongoDB, data warehousing, data lakes and various SQL/NoSQL databases and data warehouses.
Deep Learning: Multi-Layer Perceptron, Artificial Neural Networks (ANN’s), Convolutional Neural Networks (CNN’s), Recurrent Neural Networks (RNN’s, LSTM’s), Gradient Descent Optimizers, TensorFlow, Keras, PyTorch
Analysis Techniques: Naïve Bayes, Linear Regression, Logistic Regression, K-Nearest Neighbors (K-NN), K-Means Clustering, Gaussian Mixture Models, ANOVA, ARIMA, SMOTE, Prophet, Classification and Regression Trees (CART), Decision Trees, Ensemble Learning (Bagging Boosting), Random Forests, Support Vector Machines (SVM), Principal Component Analysis (PCA), Auto Encoders
Data Modeling: Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, time-series analysis
Soft Skills: Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately, leadership, mentoring, coaching
Lead Data Scientist
Confidential, Woonsocket, RI
- Coding for all tasks was done in Python on Confidential
- Used Bash to connect to Confidential on localhost
- All data was hosted on Teradata SQL servers, except for a few . Confidential files that were emailed to me directly. Performed inner and left joins, and created volatile tables to perform select queries
- Executed code using Google Colab Pro
- Performed code refactoring by simplifying code and making variable names / helper functions generic as needed, as well as adding/removing steps and reordering steps in my logic
- Evaluated logistic regression model for propensity score matching, as well as XGBoost model
- Engineered the workflow for data input, preprocessing, calculations, and creating output files
- Met regularly with senior managers to discuss project status and updates
- Imputed missing values, removed outliers with histogram visualization and a simple helper function, and performed feature engineering
- Implemented binary search algorithm from scratch for sorted test/control probabilities
- Used a shared drive and email to track changes in codebase
- Used histogram to visualize thresholds for outlier detection and removal
- Mentored a junior data scientist to understand my codebase and prepared him to contribute to the codebase for that task
- Conducted regular meetings with senior managers, full-time employees, junior data scientists, and directors
- Configured Teradata SQL to work with Python and Confidential environment using the teradatasql library. Mentored junior data scientist on how to do this independently.
- Explained and reviewed code and logic with senior and junior employees to perform knowledge transfer
- Mentored junior data scientist in environment setup and usage
- Automated a process to save hundreds of thousands of hours across the company
- Wrote and executed macros in Excel to perform binning in Excel spreadsheets
- Utilized wide variety of libraries, including numpy, pandas, seaborn, matplotlib, scikit-learn, xgboost, and teradatasql
- Actively contributed to multiple projects across multiple teams concurrently
Lead Data Scientist - Revenue Management
Confidential, Miami, FL
- Designed and deployed automated data pipelines in Microsoft Azure using Data Factories.
- Built and fine-tuned General Linear Models and Artificial Neural Networks in Python and R.
- Executed SQL queries on a Spark cluster with the use of PySpark.
- Built explainable price elasticity models using regression analysis.
- Predictive modeling using state-of-the-art methods.
- Coordinated with offshore teams to manage workflows and deliver timely results.
- Integrated my model with the existing software suite by making a language agnostic.
- Interacted with the other departments to understand and identify data needs and requirements.
- Involved in defining and implementing an ETL pipeline.
- Extracted data from a MySQL source by executing queries with Python’s SQLAlchemy and Pandas.
- Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators.
- Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.
- Ingested Data using Microsoft Azure’s Data Factories.
- Collaborated the data mapping document from source to target and the data quality assessments for the source data.
Confidential, Miami, FL
- Developed time series forecasting models in python using statsmodels and TensorFlow to forecast demand for cruise ships.
- Helped balance the load of ports by forecasting demand.
- Incorporated data mined and scraped from outside sources.
- Worked closely with the DevOps team to integrate my solutions into their software.
- Enhanced data collection procedures to include information that is relevant for building analytic systems.
- Ad-hoc analysis and presentation of results in a clear manner.
- Created machine learning algorithms using Scikit-learn and Pandas.
- Built predictive models to forecast demand.
- Hands-on use of commercial data mining with tools created in R and Python.
- Processing, cleansing, and verifying the integrity of data used for analysis.
- Developed dashboards for use by executives for ad hoc reporting using Tableau visualization tool.
- Solved analytical problems, and effectively communicated methodologies and results.
- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
- Generalized feature extraction in the machine learning pipeline which improved efficiency throughout the system.
- Performed univariate, bivariate, multivariate analysis and thereby created new features and tested their importance.
Confidential, New York, NY
- Scraped the web using Selenium and Python libraries like ScraPy, Requests, and BeautifulSoup4.
- Preformed SQL Queries for data engineering (concatenations, joins, cleaning, imputation).
- Engineered an automated MS Azure pipeline to collect data, engineer the data, and feed it into my model for training.
- Performed several regression techniques including: Linear Regression, Ridge Regression, LASSO, Elastic Net Regression, and Forward Stepwise Regression.
- Developed tree-based models leveraging Bagging, Boosting, and Random Forest techniques.
- Researched and tracked down outside data sources to supplement the initial data.
- Built Autoregressive Neural networks to try to forecast future prices and demand.
- Tuned Hyper Parameters using with algorithms such as RandomSearch and GridSearch along with K-Folds Cross Validation.
- Analyzed data and built charts and graphs using MatPlotLib, Plot.ly, and Seaborn during exploratory data analysis (EDA).
Confidential, San Diego, CA
- Used transfer learning to train and fine-tune a model built with Keras on TensorFlow in Python.
- Built a semantic segmentation model to correctly measure femoral cartilage in knee joints.
- Utilized convolutional neural networks (CNN’s) so highlight areas with high confidence.
- Fine-Tuned a pre-trained ResNet model architecture.
- Wrote python code in Jupyter notebooks and Pycharm.
- Processed image files with Python’s Pydicom library in conjunction with NumPy and Pandas.
- Achieved a state-of-the-art F1 score (85%+).
- Incorporated out work into a dashboard application for company use.
- Defined, designed, documented conceptual, logical data models.
- Conducted data analysis on the reports/dashboards - identify gaps.
- Validated and selected models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
- Data profiling - validate data quality issues for the critical data elements.
- Coordinated with a team of three other data scientists.
- Trained our models on cloud machines in Google Cloud Platform’s Compute engine.
- Deployed the model on GCP Compute Engine virtual machines.
- Analyzed the ROC curve to determine proper thresholds for classification segmentation.