We provide IT Staff Augmentation Services!

Lead Data Scientist Resume

0/5 (Submit Your Rating)

Tysons Corner, VA

SUMMARY

  • Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K - Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosted Decision Trees, Stacking Models.
  • Hands on Experience on Customer Churn, Sales Forecasting, Market Mix Modeling, Customer Classification, Survival Analysis, Sentiment Analysis, Text Mining, Recommendation Systems.
  • Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering, Regression and Time Series Analysis to analyze data for further Model Building.
  • Hands on experience on Deep Learning Techniques such as Back Propagation, Choosing Activation Functions, Weight Initialization based on Optimizer, Avoiding Vanishing Gradient and Exploding Gradient Problems, Using Dropout, Regularization and Batch Normalization, Gradient Monitoring and Clipping Padding and Striding, Max pooling, LSTM.
  • Proficient in Natural Language Processing (NLP), NLU using RASA, Text Analytics/Text Blob/Documents/Topic Modelling/Contracts/Chatbots/Cognitive Computing, Document classification using R and Python (NLTK, Genism, etc.). Used NLP- Bag of words/N-gram algorithms, Term-document matrices, Text categorization and text routing, remote-sensing and geo-spatial intelligence.
  • Experience with Multi-Purpose NLP Models such as ULMFiT, Transformer, Google’s BERT, Transformer-XL, OpenAI’s GPT-2; Word Embeddings - ELMo, Flair; and few other Other Pretrained Models like StanfordNLP.
  • Developed Natural Language Processing, machine translation, language detection, classification with different aspects of dealing NLP like Phonology, Morphology, document classification, Named Entity Recognition (NER), topic modelling, document summarization, computational linguistics, advanced and semantic information search, extraction, induction, classification and exploration.
  • Excellent knowledge and experience in using open source NLP packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford CoreNLP.
  • Experience with NLP technique such as word embeddings (word2vec, GloVe, fasttext, Transformers), topic modeling (LSA/LSI, LDA, NMF), search (Elasticsearch, FAISS), dialogue systems/chat bots (Rasa, kore.ai), knowledge graphs.
  • Experience building Machine Learning & NLP solutions over open source platforms such as SciKit-Learn,Tensorflow,TensorBoard, Keras, PyTorch, SparkML, Torch, Caffe, H2O, PyTorch (Huggingface).
  • Understanding of different components of Hadoop ecosystem such as Hue, Pig, Hive, HBase, HDFS, Map-reduce, Flume etc.
  • Experienced in Cloud computing - GCP (Google Cloud Platform), Azure (Azure cloud platforms, Data Model in Azure SQL databases, tunings, Azure Data Factory, etc.), AWS.
  • Experienced with data visualization using tools like GGplot, Matplotlib, Seaborn, Tableau, R Shiny and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
  • Created Spark scripts in Python, Scala, and Functional Programming in Scala.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy, Pandas and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Developed and debugged simple and complex SAS programs. Implemented statistical analysis plan, data preparation, and data manipulation into SAS programs and macros. Created statistical tables, figures, and listings for Marketing data. Built models, validated SAS program, and analyzed datasets using SAS products Enterprise Guide, segmentations tools like Enterprise Miner, Office Analytics, EDI, etc.
  • I have worked on cloud engineering (AWS, GCP, Azure and other custom-built secure products) where I designed and developed scalable systems used for running ML workloads, distributed model training, hyperparameter tuning, and real-time inference.
  • Designed and implemented scalable production ML architectures in AWS .
  • Built CI/CP Pipelines. Used Jenkins, Kubernetes, Dockers, containerization.
  • Leading capability Multithreading,Collection,Rest,Micro-services Open source RASA NLU
  • Have worked on ML framework such as Pytorch and TensorFlow.
  • Worked on strategy-building techniques for routine application maintenance tasks, cloud networking and infrastructure.

TECHNICAL SKILLS

Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau

Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning, Heuristics, Neural Nets, Markov Decision Process

Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization.

Feature Engineering: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods.

Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova

Sampling Methods: Bootstrap sampling methods and Stratified sampling

Model Tuning/Selection: Cross Validation, AUC, Precision/Recall, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization

Time Series: ARIMA, SARIMAX, Holt winters, Exponential smoothing, Bayesian structural time series

SAS Tools: SAS Programming with SAS Base, SAS 9.2, SAS BI Suite, Enterprise guide.

PROFESSIONAL EXPERIENCE

Confidential, Tysons Corner, VA

Lead Data Scientist

Responsibilities:

  • Developed a personalized recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best apps to a user based on similar user profiles. The recommendations enabled users to engage better and helped improving the overall user retention rates at Confidential .
  • Forecasted sales and improved accuracy by 10-20% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns in addition to incorporating exogenous covariates. Increased accuracy helped business plan better with respect to budgeting and sales and operations planning.
  • Analyzed the complex datasets and created models to interpret and predict trends or patterns in the data using Time series analysis, Forecasting, Regression analysis.
  • Created interactive dashboard suite that illustrated outlier characteristics across several sales-related dimensions and overall impact of outlier imputation in R (Shiny).Used iterative outlier detection and imputation algorithm using multiple density-based clustering techniques (DBSCAN, kernel density estimation)
  • Implemented market basket algorithms from transactional data, which helped identify items used/purchased together frequently. Discovering frequent item sets helped unearth cross sell and upselling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams.
  • Derive meaning and sense of the huge amounts of data with mathematics, statistics and computer science disciplinary knowledge and incorporates machine learning, deep learning, cluster analysis, data mining and visualization for Real Estate Data.
  • Predicted the likelihood of customer churn based on customer attributes like customer size, RFM loyalty metrics, revenue, type of industry, competitor products and growth rates etc. The models deployed in production (in platforms like PCF, AWS, KUBERNETES) environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance like price discounts, custom licensing plans.
  • Built machine learning based regression models using scikit-learn python frameworks to estimate the customer propensity to purchase based on attributes such as customer verticals they operate in, revenue, historic purchases, frequency, and recency behaviors. These predictions helped estimate propensities with higher accuracy improving the overall productivity of sales teams by accurately targeting the prospective clients.
  • Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, Confidential made selective and cautious price cuts for certain licensing categories.
  • Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
  • The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
  • Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data.

Confidential, Greensboro, NC

Data Scientist + Artificial Intelligence + Machine Learning

Responsibilities:

  • Built relational databases in SQL server of several flat files of partner information from several large flat files in Python. Used logistics regression and random forests models in R/Python to predict the likelihood of customer participation in various marketing programs.
  • Designed and developed visualizations and dashboards in R/Tableau that surfaced the primary factors that drove program participation and identified the best targets for future targeted marketing efforts.
  • Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
  • Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
  • Personalization, Target Marketing, Customer Segmentation and profiling, Audience analytics, digital hypothesis testing and measurement, using Facebook/Social platforms, Google Analytics/Big Query.
  • Projected customer lifetime values based on historic customer usage and churn rates using survival models. Understanding customer lifetime values helped business to establish strategies to selectively attract customers who tend to be more profitable for Confidential
  • It also helped business to establish appropriate marketing strategies based on customer values.
  • Performed Data Cleaning, features scaling, featurization, features engineering.
  • Used Pandas, PySpark, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python at various stages for developing machine learning model and utilized machine learning algorithms such as linear regression, Naive Bayes, Random Forests, Decision Trees, K-means, & KNN.
  • Implemented number of Natural Language process mechanism for chatbot.
  • Customer segmentation based on their behaviour or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behaviour patterns.
  • Wrote Java/Scala code to run on Spark for data engineering.
  • Loaded data from Hadoop/Hive to Amazon S3.
  • Developed 11 customer segments using unsupervised learning techniques like KMeans.
  • The clusters helped business simplify complex patterns to manageable set of 11 patterns that helped set strategic and tactical objectives pertaining to customer retention, acquisition and spend.
  • Price Optimization and Revenue management:
  • Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, Confidential &T made selective and cautious price cuts for certain licensing categories.
  • The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
  • Performed Clustering with historical, demographic, and behavioral data as features to implement the Personalized marketing that offers right product to right person at the right time on the right device.
  • Analyzed high volume, high dimensional client and survey data from different sources using SAS and R.
  • Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data.

Confidential, Greenwood Village, Colarado

Data Scientist + Data Engineer

Responsibilities:

  • Gathered requirements from Business and reviewed business requirements and analyzing data sources.
  • Performed Data collection, Data Cleaning, features scaling, features engineering, validation, Visualize, interpret, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, PySpark, SciPy, Scikit-Learn.
  • Involved with Recommendation Systems such as Collaborative filtering and content-based filtering.
  • Studied and implemented Fraud detection models to monitor the unconventional purchases from customer bases and alert them with updates.
  • Analyzed and implemented few research proofs of concept models for Real time fraud detection over credit card and online banking purchases.
  • Worked with ThreatMetrix to obtain device data to include within the Fraud detection Model.
  • Clustered the customers based on demographics, health attributes, policy inclinations using hierarchical clustering models and identified strategies for each of the clusters to better optimize retention, marketing and product offering strategies.
  • Responsible for managing monitoring and coordinating claims fraud risk management projects.
  • Performed Sentiment Analysis using social media and survey data to address customer grievances and brand awareness using Natural Language Processing (NLP).
  • Solved a binary classification problem (transferring to lower risk group or not with given financial incentive) with a logistic regression.
  • An artificial neural network was utilized with Keras/TensorFlow, PyTorch in python to solve binary classification problem for premiums and their intersection with the discriminant.

Confidential, Columbus, OH

Data Scientist + Data Engineer

Responsibilities:

  • Built classification models using several features related to customer demographics, macroeconomic dynamics, historic payment behavior, type and size of insurance policy, credit scores and loan to value ratios and with the model predicted the likelihood of default under various stressed conditions.
  • Data required extensive cleaning and preparation for machine learning modeling, as some observations were censored without any clear notification.
  • Carried out segmentation, building predictive models and integrating secondary and primary data using R, python, SPSS, and SQL.
  • Built classification models that predict the probability of a customer's response to a cross-sell campaign using python.
  • Designed and deployed real time Tableau dashboards that identified policies which are most/least liked by the customers using key performance metrics that aided the company for better rationalization of their product offerings.
  • Analyzed large data sets for reporting and visualization using R.
  • Tracked campaigns and communicate campaign performance and ROI analysis.
  • We used modifiers, including L1 regularization, dropout, and Nesterov momentum to enhance the neural network and optimize generalization.
  • Provided reporting dashboard for stakeholders and project owners to rapidly provide information via plotly, bokeh, matplotlib, and seaborn.
  • We required internally generated data on policies, premiums, and payouts from customer databases to model policy payout as a function of survival model outputs.
  • Retrieved data from devices, which was streamed to our company database using AWS-kinesis (real-time data streaming).
  • Supported client by developing Machine Learning Algorithms on big data using PySpark to analyze transaction fraud, Cluster Analysis etc.
  • For this largest property and casualty insurer in the United States, with over 120 offices located in 54 locations, and offers commercial, property, casualty, specialty and personal insurance services- developed, executed, tracked and analyzed targeted marketing campaigns, Utilized the social media campaign management application to develop and report on complex, multi-step campaigns. Analyze campaign performance, report on key business metrics, and develop insights through customer analysis.
  • Used unsupervised learning techniques such as K-means clustering and Gaussian Mixture Models to cluster customers into different risk groups based on health parameters provided through wearable technology regarding their activities and health goals.
  • Multiple statistical modeling approaches were applied to determining the usefulness of the wearable technology data for various insurance products.
  • Survival modeling techniques, such as Poisson regression, hidden Markov models, and Cox proportional hazards, were used to model time to different events utilizing wearable data (time to death for life insurance, time to next hospital visit, time to next accident, time to critical illness, etc.)
  • Documented methodology, data reports and model results and communicated with the Project Team/Manager and other data scientists to share the knowledge on retention analytics.
  • Took End-to-end ownership of designing, developing, and deploying machine learning models (data preparation => variable selection => model building and evaluation => deployment)
  • Forecasted bank-wide loan balances under normal and stressed macroeconomic scenarios using R. Performed variable reduction using the stepwise, lasso, and elastic net algorithms and tuned the models for accuracy using cross validation and grid search techniques.
  • Automated the scraping and cleaning of data from various data sources in R and Python. Developed Banks’s loss forecasting process using relevant forecasting and regression algorithms in R.
  • The projected losses under stress conditions helped bank reserve enough funds per DFAST policies.

Confidential, Greely, CO

Data Modeler/Data Analyst

Responsibilities:

  • Built executive dashboards in Tableau that measured changes in customer behavior post campaign launch; the ROI measurements helped c/mpany to strategically select the effective campaigns.
  • Analyzed large datasets to provide strategic direction to the company. Performed quantitative analysis of ad sales trends to recommend pricing decisions.

We'd love your feedback!