We provide IT Staff Augmentation Services!

Senior Data Scientist Resume


  • Statistics and Probability, including statistical modelling, statistical hypothesis testing; sound performance executing machine learning projects
  • Familiarity with trends in relevant technologies and shifts in the data analytics climate
  • Strong leadership skills with specific experience in the Agile framework; excellent communication skills, both verbal and written
  • Competent taking machine learning from experimentation to full deployment
  • Extensive experience with 3rd - party cloud resources: AWS, Google Cloud, and Azure
  • Developed neural networks architectures from scratch, such as Convolutional (CNN’s), LSTM’s, and Transformers. Also built unsupervised approaches such as k-means, gaussian mixture models, and auto-encoders.
  • Proficient in all supervised machine learning manners - Linear Regression, Logistic Regression, Support Vector Machines, Random Forests, Gradient Boosting, Survival Modeling
  • NumPy stack (NumPy, SciPy, Pandas, and matplotlib) and Sklearn.
  • Proficient in TensorFlow and PyTorch for building, validating, testing, and deploying reliable deep learning algorithms for specific business challenges
  • Experience with ensemble algorithm techniques, including Bagging, Boosting, and Stacking; knowledge with Natural Language Processing (NLP) methods, in particular FastText, word2vec, sentiment analysis


Programming: Python, Spark, SQL, R, Git, bash

Libraries: NumPy, Pandas, Scipy, Scikit-Learn, Tensorflow, Keras, PyTorch, statsmodels, Prophet, lifelines, PyFlux, arch, FeatureTools, Lime

Version Control: GitHub, Git, BitBucket

IDE: Pycharm, Sublime, Atom, Jupyter Notebook, Spyder

Data Stores: Large Data Stores, both SQL and noSQL, data warehouse, data lake, Hadoop HDFS, S3


NoSQL: Amazon Redshift, Amazon Web Services (AWS), Cassandra, MongoDB, MariaDB

Computer Vision: Convolutional Neural Network (CNN), Faster R-CNN, YOLO

Big Data Ecosystems: Hadoop (HBase, Hive, Pig, RHadoop, Spark, HDFS), Elastic Search, Cloudera Impala.

Cloud Data Systems: AWS (RDS, S3, EC2, Lambda), Azure, GCP

Data Visualization: Matplotlib, Seaborn, Plotly, Bokeh





  • Manipulated GEOTiff files for conducting spatial analysis using MATLAB & Python to plot population densities and overlay other socioeconomic data in various regions across the United States & world.
  • Built scraping modules using Scrapy, BeautifulSoup, & requests libraries to extract Confidential ’s reseller data, including locations, discounts, & additional pricing data, along with associated dempographic data within specific US regions.
  • Created various scripts to load, concatenate, & clean multiple data files used by data science members for analysis and forecasting Confidential ’s vendors’ future behaviors.
  • Undertook several techniques for forecasting month ahead loss/gain at each SSG level with Python, including ARIMA, Prophet, and LSTM.
  • Constructed production-level code for new vendor data that was fed into Tableau for data analysts to present at various times.
  • Devoted Data Lab, Confidential ’s cloud platform, to train different time-series models for vendor forecasting .
  • Exercised appropriate version control using Confidential ’s Box & Quip platforms to synchronize code & data files with data science members.




  • With the PyTorch Python API, the team built the architecture and trained the convolutional neural networks (CNN).
  • Exploited transfer learning with custom-built classifiers in PyTorch to speed up production time and improve results.
  • Fine-tuned ResNet-50, ResNet-101, and ResNet-152 models to adapt their pre-trained weights to our use case.
  • Used a fully convolutional network (FCN) - pre-trained YOLO v3 algorithm - to speed up predictions.
  • Took into consideration prediction time and overhead to make sure our predictions happened in real time.
  • Regularized the data by applying transformations to the images using Pillow.
  • Worked with large stores of video imaging data stored on AWS S3 buckets for training the model.
  • Supplied our pickled model to the software development team to integrate into the drone pilot’s heads-up display (HUD).
  • Employed proper version control using git with BitBucket to coordinate with fellow team members.
  • Employed AWS Sagemaker to explore object detection at a high level and to train my model before opting for a lower level approach.
  • Replaced proprietary software with custom-built algorithms for greater control over the outcomes.




  • Endeavored multiple approaches for predicting day ahead energy demand with Python, including exponential smoothing, ARIMA, Prophet, TBATS, and RNN’s (LSTM)
  • Successfully built a Generalized Autoregressive Conditional Heteroskedasticity (GARCH) using PyFlux, to model the uncertainty of Dominion’s other time series, ensuring a ‘safety’ stock of generating units
  • Incorporated geographical and socio-economic data scraped from outside resources to improve accuracy.
  • Incessantly validated models using a train-validate-test split to ensure forecasting was sufficient to elevate optimal output of the number of generation facilities to meet system load.
  • Prevented over-fitting with the use of a validation set while training.
  • Built a meta-model to ensemble the predictions of several different models.
  • Performed feature engineering with the use of NumPy, Pandas, and FeatureTools to engineer time-series features.
  • Coordinated with facility engineers to understand the problem and ensure our predictions were beneficial.
  • Participated in daily standups working under an Agile KanBan environment.
  • Queried Hive by utilizing Spark through the use of Python’s PySpark Library.

Hire Now