Data Scientist Resume

SUMMARY

Data Scientist with 6+ years of experience building, deploying Machine learning and deep learning models in Python and R programming.
Expert - level programming skills in analytical scripting languages such as R Programming, Python, SQL, Pyspark
Experience analyzing very large data sets from big data - Hadoop eco system using Hive, Impala, and implemented machine-learning algorithms in clusters.
Hands on experience optimizing Machine learning algorithms and working with Deep learning frameworks like Keras, Tensorflow, Azure, Scikit-learn.
Worked as a consultant in Retail, Manufacturing, Marketing Analysis and Pharma performing Ad hoc analysis
Experience using advanced hacker statistics and machine learning algorithms to find useful insights and present those insights with interactive visualizations using tools like Tableau, PowerBI, RShiny, ggplot2, Plotly.

TECHNICAL SKILLS

Programming: Python, Pyspark, SQL, R Programming

RDBMS: SQL Server, MySQL

Python: numpy, Pandas, scikit-learn, scipy, Pyspark, BeautifulSoup

R Packages: Data Wrangling dplyr, TidyR, RODBC, caret, Hmisc, missForest, dummies, ClustOfVar, Shiny R, etc.

Machine Learning: randomForest, ctree, rpart, lm, glm, nnet, xgboost, e1071, lasso, ridge, etc.

Web Mining: rminer, NPL, arules, Isa, rvest

Deep learning Framework: TensorFlow, Keras, Caffe

Big Data Technologies: Hadoop eco system, Spark2.0, SparkML, Hbase-Hive, Pig, Impala, Databricks, HiveSQL

Cloud: Amazon S3, Cloudera

Visualization: ShinyR, PowerBI, Plotly, Tableau

IDE: Anaconda Jupyter notebook, Pycharm, Rstudio.

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential

Responsibilities:

Design a successful Deep learning Model- Bot (Artificial Neural Network) to predict with 85 % precision and suggest prefect-matching shade of foundation based on skin texture (L,a,b Values) on customers.
Analyzed the agreement between two users (standard vs new) in measuring the skin elasticity. Incorporated methods such as ICC (intra class correlation) and Bland-Atman Plots to report the finding using PowerBI.
Built pipeline for classifier (Scikit learn - Pipeline: Tree Modeling (CART, ID3) Decision Tree, Adaboost) employing data transformation, parameter tuning (GridSearchCV) for real time sensory data to detect anomalies.
Developed an application for forecasting sentiment scores (ARIMA) and predict rating for various products available online, designed recommendation system - collaborative filtering method to segment users based on public likings.
Worked on modeling a classifier (Random forest, Xgboost) to predict the crisis state of a clinical study on 300 subjects. Process involved extensive data cleaning and preprocessing as the data is obtained for various sources.
Analyzed and extracted important features among 400 variables that influenced color fading on hair using advanced feature selection algorithms involving backward selection, Boruta, Decision Tree.

Data Scientist

Confidential, St. Louis, MO

Responsibilities:

Achieved 70% accuracy in Credit card fraud analysis to detect fraudulent transactions at cafes (Kiosk, online and POS) using Machine learning algorithms (Random forest, KNN, Decision Tree, XGboost).
Analyzed historical patterns of customer’s behavior to optimize recommendation systems for catering that accounted for 4% increase in revenue. Assisted in determining Cafés to push offers based on statistical inference and A/B testing
Performed data wrangling, manipulation and transformation using Pyspark, pandas, numpy and Scikit-learn on 4TiB of data pulled by joining multiple hive tables using complex HiveQL queries.
Built robust classifiers to determine Cafes and hubs to push discounts and offers to maximize sales and revenue.
Designed dense Neural Network in Keras, TensorFlow to predict sales based on discount and campaigns. Reduced the model error by 10% by advanced parameter-tuning techniques and feature engineering.
Monitored network performance in Tensorboard and presented the findings using interactive visualization.
Implemented clustering algorithms to cluster Cafes - cities to forecast sales based on weather impact. Used dimensionality reduction techniques (PCA, SVD) to reduce multi-dimensional features.

Data Scientist

Confidential

Responsibilities:

Implemented various classification models (Decision Tree, SVM, Neural Networks and Naïve Bayes) on large customer data set to recognize the pattern and to build many robust and accurate classification models.
Used precision and recall on confusion matrix and ROC curve to study the performance of the models.
Improvised the model’s prediction accuracy after fine-tuning to reduce false negative rate significantly.
Supported technical team members in the development of automated processes for data extraction and analysis.
Extracted, pre-processed large datasets using dplyr, data.table packages for implementing machine-learning algorithms
Performed feature engineering on the data set involving crucial information to identify important features.
Deployed various dimensionality reduction techniques like PCA (Principle Component Analysis) to reduce the dimensions of variables without affecting the data information.
Prepared scripts to ensure proper data access, manipulation and reporting functions with R programming languages.

Data Scientist

Confidential, Orangeburg NY

Responsibilities:

Developed and designed advance predictive analysis models using LASSO, RIDGE, SVM algorithms and optimize it by tuning various parameters in R Programming.
Extracted information like customer feedback, plans etc. using various mining package in R.
Performed data cleaning and manipulation using dplyr and TidyR on large data set with millions of observations.
Performed ad hoc statistical, data mining and machine learning analysis on complex business problems and wrote functions to process data related to call volumes, customer experience and call deflection.
Built powerful classifier in keras- Tensorflow to classify customers’ features. Experimented with the model by changing the learning rate, activation parameters.
Optimized and created clustering algorithms like KNN and hierarchical to segment customers based on different features and plans.
Worked with Log files, extracted insightful information’s (Hive, PIG, Impala) and implement various machine learning algorithms in Spark.
Helped with analyzing streaming data (Apache storm) and worked with chunk data.
Involved in projects to improve customer sentiment and built models in R Shiny
Explained the finding in an easy and understandable way to the business and marketing team.