We provide IT Staff Augmentation Services!

Data Scientist Resume



  • Passionate Data Scientist with an advanced degree in Statistics and 6 years of industrial experience in Data Science, Machine Learning and Natural Language Processing, with proven technical expertise in developing and implementing large scale algorithms to significantly impact business revenues and guide businesses in taking quality data driven decisions.
  • Competent in end - to-end implementation of a data science project
  • Expertise in Statistical Analysis, Hypothesis Testing, Data Cleaning and Processing, Supervised and Unsupervised Machine Learning, Deep Learning and Natural Language Processing
  • Strong mathematical knowledge in Linear Algebra, Stochastic Theory, Game Theory, Markov Decision Process (MDP) and Non-Linear Dynamics
  • Experience in handling semi-structured, structured and sparse data
  • Proficient in balancing datasets by utilizing Resampling techniques such as SMOTE for Oversampling and Cluster Centroid Method for Undersampling
  • Experience in performing Normalizing and Standardizing for optimal performance in relational and dimensional databases
  • Adept at employing Feature Engineering and Feature Selection techniques for feature extraction
  • Hands on experience in Dimensionality Reduction algorithms such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDE) and t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • Expertise in Supervised learning algorithms inclusive of, but not limited to Regression, K Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision trees and Ensemble models: Bagging, Random Forest s, Adaboost, Gradient Descent Boosting and XGBoost
  • Expertise in Unsupervised Machine Learning techniques including K-Means, Gaussian Mixture Models (GMM) and Hierarchical Clustering
  • Proficient in Deep Learning net families such as Multilayer Perceptrons, Artificial Neural Networks, and Recurrent Neural Networks with LSTM and GRU
  • Adept in hyperparameter tuning using Random Search, Grid Search and Bayesian Optimization
  • Experience employing Tensorflow and its framework Keras in Python for Deep Learning
  • Proficient in Natural Language methods for Sentiment Analysis using Word Embeddings like Word2Vec, tf-idf and Glove Methods
  • Experience in Data Integration, Validation and Data Quality control for ETL processes and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS
  • Adept in employing data visualization tools such as Tableau and Python libraries Matplotlib, Seaborn and Plotly to create visually appealing plots and interactive dashboards
  • Actively involved in all phases of the Data Science project life cycle including Data Extraction (ETL), Cleaning, Preprocessing, Visualizing, Modelling and Version Control (GIT)
  • Experience with cluster computing frameworks such as HADOOP ecosystem
  • Skilled at working in Windows and Linux platforms
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) including analysis, design, development, testing, implementation and maintenance
  • Experience in Agile and SCRUM environments


Languages & Softwares: Python (numpy, scipy, pandas, matplotlib, seaborn, scikit-learn, tensorflow, NLTK, keras), R (dplyr, ggplot, caret, glmnet, e1071, nnet), MATLAB, WEKA, Minitab, Tableau

Statistical & Machine Learning Techniques: SVD, Standardization, Normalization, L1/L2 Regularization, Loss Minimization, Performance Measurement of Models, Featurization, Feature Engineering, Matrix Factorization, Model Calibration, A/B Testing, Randomization, Point and Interval Estimation, Confusion Matrix, ROC curve, Precision-Recall Curve, Graph Theory

Databases: SQL Server, MS-Access

Operating systems: Mac, Windows, Linux


Confidential, NYC

Data Scientist


  • Worked alongside the data engineering and data science teams to build high-performance, low latency systems to manage high velocity data streams
  • Cleaned and Processed the unstructured fraudulent wire extraction data via Tokenizing, Stemming and Parts of Speech tagging to extract customer and bank information
  • Employed Regular Expressions and built Named Entity Recognition models using the Natural Language Tool Kit (NLTK) and SpaCy to pull out relevant customer information
  • Built a Conditional Random Fields (CRF) model in scikit-learn for pattern recognition and compared the model predictions against actual outputs
  • Employed python’s visualization libraries to draw patterns from customer credit history and payment activities on historical data to analyze customer behavior and generate reports to the business team
  • Assigned probability scores to credit card applicants based on their feature attributes to aid client make better decisions regarding an applicant’s credibility
  • Built NLP pipelines and collaborated with the DEVOPS team to deploy code into production
  • Performed web scraping from google alert links to pull out information for analysis
  • Utilized Random Forests and SVM’s to classify if the crime in question is of a Financial nature
  • Extracted fraudster information for financial crimes and wrote SQL queries to perform Teradata search to determine if the concerned person is an Confidential customer
  • Utilized results from Teradata search and customer information from Fraudulent wire Transactions to generate auto narratives of potential threats to Amex
  • Conceptualized and implemented Artificial Neural Networks as well as LSTM’s via dense Recurrent Neural Networks into the pipeline to process continuous data and gather information in sequence
  • Created distributed Tensor Flow environments across multiple CPUs and GPUs to run in parallel

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, Linux, Git, Microsoft Excel, HADOOP, PCA, Logistic Regression, CRF, Tensor Flow, Keras, Natural Language Tool Kit, Spacy, Named Entity Recognition, Natural Language Generation, Git

Confidential, Kansas City

Data Scientist


  • Performed Data Cleaning, Feature Scaling, Feature Engineering and Exploratory Data analysis to maximize insight, detect outliers and extract important features for modelling
  • Implemented Principal Component Analysis (PCA) and t-Stochastics Neighbor Embedding (t-SNE) dimensionality reduction algorithms to achieve reduced datasets
  • Implemented Clustering algorithms for market segmentation to analyze customer behavior
  • Trained several machine learning models on selected features to predict Customer churn
  • Utilized cross-validation techniques and LASSO regularization to avoid overfitting, then evaluated the models adopting performance metrics more robust to imbalanced classes
  • Tuned the model hyperparameters using Bayesian optimization and grid search to achieve higher levels of model performance
  • Improved model accuracy by 5% introducing Ensemble techniques: Bagging, Gradient, Xtreme Gradient and Adaptive Boosting
  • Feature engineered email data by employing NLP techniques like Word2Vec, BOW and tf-idf
  • Performed Sentiment Analysis on email feedback to understand the emotional tone behind words
  • Utilized data visualization tools such as Tableau and Python’s vast data visualization libraries to communicate findings to the data science, marketing and engineering teams
  • Generated ADHOC reports to the business teams in Tableau to make client make impactful data driven decisions

Environment: Microsoft Excel, SQL, NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, PySpark, Random Forests, SVM, t-SNE, PCA, K-Means, Tensor Flow, Keras, Natural Language Tool Kit, Tableau, Git


Decision Scientist


  • Wrote production level code and built ETL pipelines in PostgreSQL and R to ingest huge volumes of messy data
  • Regularly performed A/B testing for evaluating test vs control to tighten the production line
  • Created goal driver models for branches of a regional bank in the US to white-box the product level goal objectives provided by one of the leading goal setting vendors
  • Iterated over different variations of regression models to arrive at final linear mixed models
  • Assigned probability scores using logistic regression to candidates applying for a call center representative role to a leading insurance company based on their application details to aid client make decisions regarding employee survival and performance
  • Wrote code in R to web scrape html to extract product data from e-commerce websites
  • Built a Shiny based interactive scheduling tool in R to investigate workplace attrition, client file redeliveries and client file delays
  • Created documentation for muESP which enables operationalization of real time and batch analytics


SQL Developer


  • Involved in the Software Development Life Cycle (SDLC) process, analyzed business requirements and functional work flow of information from source to destination
  • Developed Stored Procedures, functions and database triggers while maintaining referential integrity and implementing complex business logic
  • Created SSIS packages for transferring data from various data sources like SAP R3 system, Oracle, MS Access, Excel, *. Txt file, *.CSV files
  • Wrote complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic and created clustered and non-clustered indexes
  • Tuned Complex SQL Queries using complex T-SQL statements and implemented various types of Constraints and Triggers for Data Consistency
  • Performed tuning and optimization of queries that took longer execution times using SQL Profiler, index tuning wizard and SQL Query Analyzer
  • Tested databases and fixed bugs

Hire Now