We provide IT Staff Augmentation Services!

Data Scientist Ii Resume

Charlotte, NC


  • Experience of 6+ years, leader, problem solver, storyteller and end - to-end solution provider in Data Science/Analytics, worked with various stakeholders in different industries - Financial Services, Insurance, Transportation and Logistics, Automotive manufacturing
  • Delivered projects and managed clients in cross functional business units - Marketing, Sales, Distribution, Product, Operations etc
  • Led technical projects; Managed and mentored team of 2-3 people in various projects; followed CRISP-DM, Agile (Kanban)
  • Hands on Machine Learning models development - Propensity, Classification, Regression, Survival, Segmentation, Forecasting, NLP etc. worked in Hadoop/Spark cloud platform by Big Data Services, have experience with AWS(EC2), AzureML.
  • Worked as interim Big Data admin and led the transition to set up Big Data Hadoop stack, successfully on boarded team members to cloud environment, managed user accounts, installed tools - PySpark etc, scheduled jobs, solved various issues with Confidential
  • Successfully migrated the Propensity models to cloud platform - Confidential Big Data Services, built Machine Learning models (Python) and ETL pipeline (PySpark/Spark SQL). This production model generates - $240MM in incremental annual revenue for Marketing


Programming and Analysis: Python (pandas, numpy, seaborn, plotly), caret, rpart, ggplot, dplyr, C, C++, VBA

Hadoop Distributions: Big Data Services, Spark, PySpark, SparkSQL, AWS (EC2, S3), Hive, Sqoop

Databases: SQL server, Oracle, MySQL, HBase

Machine Learning Development: Scikit-Learn, Tensorflow (ContribLearn)

Model Serving: Spark (MLlib), AWS SageMaker, AzureML, H2O, SPSS

DATAOPS: Anaconda, Jupyter, Git, Bash, Azure DevOps, Github

Visualization: Shiny, Tableau, MicroStrategy

Project Management: Agile (Kanban), CRISP-DM

Machine Learning Algorithms: Regression (Linear, Multivariate, Lasso, Ridge), Classification (Logistic regression, Random forest, KNN, Naive bayes), Clustering (K-means, Hierarchical), SVM, Neural Network, Boosting, NLP(nltk), PCA, Hypothesis, A/B testing, ANOVA


Confidential, Charlotte, NC

Data Scientist II


  • Provided leadership and led the transition to Big Data platform stack (Hadoop, Spark etc). Migration of Propensity models to dev/prod cluster in BDS; automate production models (python), built ML & ETL data pipeline (PySpark) & onboard team to Hadoop
  • Transition to cloud infrastructure will help Confidential exit current service agreements with Confidential, hence cost savings ~ $10MM
  • Built Propensity model from various data sources to score Financial Advisors most likely to sell Flex/Shield annuity product
  • Used Logistic regression, Lasso, Random forest etc, combine seven models in three layers - Face-to-Face, Active, Inactive advisors and Product models. Driver analysis to measure success of email campaigns, used for lead generation
  • Target non-sellers on propensity score output, generates $60MM in incremental quarterly sales revenue, score model quarterly
  • Built Classification models (SVM, Naive Bayes, GBM) to score Advisors for eight firms having strategic relationship with Confidential
  • Guaranteed Minimum Income Benefit (GMIB) variable annuity product utilization and withdrawals
  • Analyzed how GMIB has been utilized by consumers based on demographics, geography; examine withdrawal rates & surrenders
  • Build Survival (cox ph) model to predict customer churn (policy surrenders) & find statistically significant drivers of policy lapse
  • Insights will drive improvement in future product design with new features, pricing decisions and for managing risk by stakeholders
  • Distribution: Wholesaler Effective Analysis
  • Collaboration with experts from University of Missouri to (a) determine optimal number of wholesalers (b) territory alignment (c) wholesaler incentive plan. Translate results from sales response models into actionable insights for stakeholders, used Tableau
  • Territory alignment based on top advisor prospects, channels, opportunity. Estimated average increase in sales revenue - $300MM
  • Natural Language Processing (NLP): Live Twitter Sentiment Analysis with NLTK
  • Sentiment analysis on live data from Twitter by using its streaming API. Five different classifiers (Naive Bayes, Bernoulli NB, Linear SVC, etc) are used for training on a labelled movie reviews dataset with a voting classifier and confidence level. Project in Github.
  • Completed Data Science and Data Engineering bootcamp, Seattle 2017 conducted by Data Science Dojo

Confidential, Ocala, FL

Decision Specialist


  • Automated models with SQL ; Reduced Forecasting time to < 1 hour, reducing man-hours, leading up to $800K annual savings
  • Built a framework, identified factors and defined metrics/KPIs affecting the terminal capacity, potential growth rate of LTL industry Methods- Clustering (K-means), customer segmentation, Random Forest, SVM. Used Shiny to build a visualization tool
  • Identified potential areas to increase efficiency thus saving thousands of dollars/month


Quality Engineer


  • Conducted Statistical analysis (pareto) of brake parts rejections, managed reports, reduction of rejections led to savings of $50k
  • Audited suppliers using Quality control methodologies like 5S, Process Control plan, FMEA, Poka Yoke, PPAP, SPC etc.

Hire Now