We provide IT Staff Augmentation Services!

Lead Data Scientist Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Data Science, Machine Learning, Artificial Intelligence and Data Engineering experience, with 10 years in developing different Statistical Machine Learning, Text Analytics, and Data Mining solutions across various business functions: providing BI, Insights and Reporting framework to optimize business outcomes through data analysis.
  • Experience in Data Science/ Machine Learning, Deep Learning, Data Mining with large datasets of structured and unstructured data, Data Validation, Data acquisition, Data Visualization, Predictive Modeling and developed predictive models that help to provide intelligent solutions.
  • Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K - Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosted Decision Trees, Stacking Models.
  • Provide insights and extract knowledge, perform data transformation on Financial (Investment Banking, Wealth Management, Asset Management, Retail Banking, Loan Processing, Consumer Finance, Private Banking, Commercial Lending), Logistics, Real Estate, Health Care, Pharmaceuticals, Finance, Telecom, Transportation domains .
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
  • Experience with data visualization using tools like GGplot, Matplotlib, Seaborn, Tableau, R Shiny and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy, Pandas and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Utilized Alteryx for design and development of ETL workflows and datasets, integration, automation; documenting procedures, building macro and app components, reusability, design and testing.
  • Experience in Data Wrangling to Optimize data systems and building a data pipeline.
  • Expertise in advanced statistics, mathematical analysis and statistical modeling, Time series analysis and forecasting, Optimization and simulation, Communication, story-telling and visualization, EDA, Data preparation and transformation
  • Performed exploratory research and analysis to identify meaningful patterns in data and used statistical methods to reject or accept proposed hypotheses about relationships or latent predictive factors discovered, to provide business value.
  • Experience in Cloud computing - GCP (Google Cloud Platform), Azure (Azure cloud platforms, Data Model in Azure SQL databases, tunings, Azure data factory,etc), AWS.
  • Proficient in Natural Language Processing (NLP), Text Analytics/Text Blob/Documents/Topic Modelling/Contracts/Chatbots/Cognitive Computing . using R and Python (NLTK, Genism,etc). Used NLP- Bag of words / N-gram algorithms, Term-document matrices, Text categorization and text routing, remote-sensing and geo-spatial intelligence.
  • Processing unstructured text, Processing speech / using speech-to-text algorithms and, data mining and big data algorithms and methods.
  • Experience with Multi-Purpose NLP Models such as ULMFiT, Transformer, Google’s BERT, Transformer-XL, OpenAI’s GPT-2; Word Embeddings - ELMo, Flair; and few other Other Pretrained Models like StanfordNLP.
  • Developed Natural Language Processing, machine translation, language detection, classification with different aspects of dealing NLP like Phonology, Morphology, document classification, Named Entity Recognition (NER), topic modelling, document summarization, computational linguistics, advanced and semantic information search, extraction, induction, classification and exploration.
  • Practice data security, encryption and masking data security best practices (e.g. data encryption, tokenization, masking).
  • Excellent knowledge and experience in using open source NLP packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford CoreNLP.
  • Worked on general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNeT,etc.,) provided by Transformers (pytorch-transformers/pytorch-pretrained-bert) for Natural Language Understanding (NLU) and Natural Language Generation (NLG).
  • Experience building Machine Learning & NLP solutions over open source platforms such as SciKit-Learn,Tensorflow,TensorBoard, Keras, PyTorch, SparkML, Torch, Caffe, H2O, PyTorch (Huggingface).
  • Understanding of different components of Hadoop ecosystem such as Hue, Pig, Hive, HBase, HDFS, Map-reduce, Flume etc.
  • Hands on Experience on Customer Churn, Sales Forecasting, Market Mix Modeling, Customer Classification, Survival Analysis, Sentiment Analysis, Text Mining, Recommendation Systems.
  • Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering, Regression and Time Series Analysis to analyze data for further Model Building.
  • Hands on experience on Deep Learning Techniques such as Back Propagation, Choosing Activation Functions, Weight Initialization based on Optimizer, Avoiding Vanishing Gradient and Exploding Gradient Problems, Using Dropout, Regularization and Batch Normalization, Gradient Monitoring and Clipping Padding and Striding, Max pooling, LSTM.
  • Developed and debugged simple and complex SAS programs. Implemented statistical analysis plan, data preparation, and data manipulation into SAS programs and macros. Created statistical tables, figures, and listings for Marketing data. Built models, validated SAS program and analyzed datasets using SAS products Enterprise Guide, segmentations tools like Enterprise Miner, Office Analytics, EDI,etc.

TECHNICAL SKILLS

Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau

Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning, Heuristics, Neural Nets, Markov Decision Process

Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization

Feature Engineering: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods

Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova

Sampling Methods: Bootstrap sampling methods and Stratified sampling

Model Tuning/Selection: Cross Validation, AUC, Precision/Recall, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization

Time Series: ARIMA, SARIMAX, Holt winters, Exponential smoothing, Bayesian structural time series

R: caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot, FBProphet

Python: pandas, numpy, scikit-learn, scipy, statsmodels, matplotlib, PySpark

SAS: Forecast server, SAS Procedures and Data Steps

Spark: MLlib, GraphX

SQL: Subqueries, joins, DDL/DML statements

Deep Learning: CNN (Convolutional Neural Network), RNN-LSTM,LSTM (Long Short Term Memory) machine, Autoencoder, (Image analytics) using Tensorflow, PyTorch(Huggingface), Keras,Tesseract, PyOCR, OpenCV

NLP: Word embedding (word2vec, doc2vec), topic classification, sentiment analysis, also Image/Video analytics

Multi-Purpose NLP Models: ULMFiT, Transformer, Google’s BERT, Transformer-XL, OpenAI’s GPT-2

Word Embeddings: ELMo, Flair

Other Pretrained Models: StanfordNLP

Compilers: Glow, TVM, CLANG, LLVM, or GCC

PROFESSIONAL EXPERIENCE

Confidential

Lead Data Scientist

Responsibilities:

  • Developed a personalized recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best apps to a user based on similar user profiles. The recommendations enabled users to engage better and helped improving the overall user retention rates at Confidential
  • Forecasted sales and improved accuracy by 10-20% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns in addition to incorporating exogenous covariates. Increased accuracy helped business plan better with respect to budgeting and sales and operations planning
  • Analyzed the complex datasets and created models to interpret and predict trends or patterns in the data using Time series analysis, Forecasting, Regression analysis.
  • Created interactive dashboard suite that illustrated outlier characteristics across several sales-related dimensions and overall impact of outlier imputation in R (Shiny).Used iterative outlier detection and imputation algorithm using multiple density-based clustering techniques (DBSCAN, kernel density estimation)
  • Implemented market basket algorithms from transactional data, which helped identify items used/purchased together frequently. Discovering frequent item sets helped unearth cross sell and upselling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams
  • Derive meaning and sense of the huge amounts of data with mathematics, statistics and computer science disciplinary knowledge and incorporates machine learning, deep learning, cluster analysis, data mining and visualisation for Real Estate Data.
  • Predicted the likelihood of customer churn based on customer attributes like customer size, RFM loyalty metrics, revenue, type of industry, competitor products and growth rates etc. The models deployed in production (in platforms like PCF, AWS, KUBERNETES) environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance like price discounts, custom licensing plans.
  • Built machine learning based regression models using scikit-learn python frameworks to estimate the customer propensity to purchase based on attributes such as customer verticals they operate in, revenue, historic purchases, frequency and recency behaviors. These predictions helped estimate propensities with higher accuracy improving the overall productivity of sales teams by accurately targeting the prospective clients.
  • Measured the price elasticity for products that experienced price cuts and promotions using regression methods; based on the elasticity, Confidential made selective and cautious price cuts for certain licensing categories.
  • Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
  • The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
  • Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data .

Confidential

Data Scientist

Responsibilities:

  • Built relational databases in SQL server of several flat files of partner information from several large flat files in Python. Used logistics regression and random forests models in R/Python to predict the likelihood of customer participation in various marketing programs.
  • Designed and developed visualizations and dashboards in R /Tableau that surfaced the primary factors that drove program participation and identified the best targets for future targeted marketing efforts.
  • Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
  • Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
  • Personalization, Target Marketing, Customer Segmentation and profiling, Audience analytics, digital hypothesis testing and measurement, using Facebook/Social platforms, Google Analytics/Big Query.
  • It also helped business to establish appropriate marketing strategies based on customer values.
  • Performed Data Cleaning, features scaling, featurization, features engineering.
  • Used Pandas, PySpark, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python at various stages for developing machine learning model and utilized machine learning algorithms such as linear regression, Naive Bayes, Random Forests, Decision Trees, K-means, & KNN.
  • Implemented number of Natural Language process mechanism for chatbot.
  • Customer segmentation based on their behaviour or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behaviour patterns.

We'd love your feedback!