Data Scientist Resume
SUMMARY
- Data Scientist with 6+ years of experience building, deploying Machine learning and deep learning models in Python and R programming.
- Expert - level programming skills in analytical scripting languages such as R Programming, Python, SQL, Pyspark
- Experience analyzing very large data sets from big data - Hadoop eco system using Hive, Impala, and implemented machine-learning algorithms in clusters.
- Hands on experience optimizing Machine learning algorithms and working with Deep learning frameworks like Keras, Tensorflow, Azure, Scikit-learn.
- Worked as a consultant in Retail, Manufacturing, Marketing Analysis and Pharma performing Ad hoc analysis
- Experience using advanced hacker statistics and machine learning algorithms to find useful insights and present those insights with interactive visualizations using tools like Tableau, PowerBI, RShiny, ggplot2, Plotly.
TECHNICAL SKILLS
Programming: Python, Pyspark, SQL, R Programming
RDBMS: SQL Server, MySQL
Python: numpy, Pandas, scikit-learn, scipy, Pyspark, BeautifulSoup
R Packages: Data Wrangling dplyr, TidyR, RODBC, caret, Hmisc, missForest, dummies, ClustOfVar, Shiny R, etc.
Machine Learning: randomForest, ctree, rpart, lm, glm, nnet, xgboost, e1071, lasso, ridge, etc.
Web Mining: rminer, NPL, arules, Isa, rvest
Deep learning Framework: TensorFlow, Keras, Caffe
Big Data Technologies: Hadoop eco system, Spark2.0, SparkML, Hbase-Hive, Pig, Impala, Databricks, HiveSQL
Cloud: Amazon S3, Cloudera
Visualization: ShinyR, PowerBI, Plotly, Tableau
IDE: Anaconda Jupyter notebook, Pycharm, Rstudio.
PROFESSIONAL EXPERIENCE
Data Scientist
Confidential
Responsibilities:
- Design a successful Deep learning Model- Bot (Artificial Neural Network) to predict with 85 % precision and suggest prefect-matching shade of foundation based on skin texture (L,a,b Values) on customers.
- Analyzed the agreement between two users (standard vs new) in measuring the skin elasticity. Incorporated methods such as ICC (intra class correlation) and Bland-Atman Plots to report the finding using PowerBI.
- Built pipeline for classifier (Scikit learn - Pipeline: Tree Modeling (CART, ID3) Decision Tree, Adaboost) employing data transformation, parameter tuning (GridSearchCV) for real time sensory data to detect anomalies.
- Developed an application for forecasting sentiment scores (ARIMA) and predict rating for various products available online, designed recommendation system - collaborative filtering method to segment users based on public likings.
- Worked on modeling a classifier (Random forest, Xgboost) to predict the crisis state of a clinical study on 300 subjects. Process involved extensive data cleaning and preprocessing as the data is obtained for various sources.
- Analyzed and extracted important features among 400 variables that influenced color fading on hair using advanced feature selection algorithms involving backward selection, Boruta, Decision Tree.
Data Scientist
Confidential, St. Louis, MO
Responsibilities:
- Achieved 70% accuracy in Credit card fraud analysis to detect fraudulent transactions at cafes (Kiosk, online and POS) using Machine learning algorithms (Random forest, KNN, Decision Tree, XGboost).
- Analyzed historical patterns of customer’s behavior to optimize recommendation systems for catering that accounted for 4% increase in revenue. Assisted in determining Cafés to push offers based on statistical inference and A/B testing
- Performed data wrangling, manipulation and transformation using Pyspark, pandas, numpy and Scikit-learn on 4TiB of data pulled by joining multiple hive tables using complex HiveQL queries.
- Built robust classifiers to determine Cafes and hubs to push discounts and offers to maximize sales and revenue.
- Designed dense Neural Network in Keras, TensorFlow to predict sales based on discount and campaigns. Reduced the model error by 10% by advanced parameter-tuning techniques and feature engineering.
- Monitored network performance in Tensorboard and presented the findings using interactive visualization.
- Implemented clustering algorithms to cluster Cafes - cities to forecast sales based on weather impact. Used dimensionality reduction techniques (PCA, SVD) to reduce multi-dimensional features.
Data Scientist
Confidential
Responsibilities:
- Implemented various classification models (Decision Tree, SVM, Neural Networks and Naïve Bayes) on large customer data set to recognize the pattern and to build many robust and accurate classification models.
- Used precision and recall on confusion matrix and ROC curve to study the performance of the models.
- Improvised the model’s prediction accuracy after fine-tuning to reduce false negative rate significantly.
- Supported technical team members in the development of automated processes for data extraction and analysis.
- Extracted, pre-processed large datasets using dplyr, data.table packages for implementing machine-learning algorithms
- Performed feature engineering on the data set involving crucial information to identify important features.
- Deployed various dimensionality reduction techniques like PCA (Principle Component Analysis) to reduce the dimensions of variables without affecting the data information.
- Prepared scripts to ensure proper data access, manipulation and reporting functions with R programming languages.
Data Scientist
Confidential, Orangeburg NY
Responsibilities:
- Developed and designed advance predictive analysis models using LASSO, RIDGE, SVM algorithms and optimize it by tuning various parameters in R Programming.
- Extracted information like customer feedback, plans etc. using various mining package in R.
- Performed data cleaning and manipulation using dplyr and TidyR on large data set with millions of observations.
- Performed ad hoc statistical, data mining and machine learning analysis on complex business problems and wrote functions to process data related to call volumes, customer experience and call deflection.
- Built powerful classifier in keras- Tensorflow to classify customers’ features. Experimented with the model by changing the learning rate, activation parameters.
- Optimized and created clustering algorithms like KNN and hierarchical to segment customers based on different features and plans.
- Worked with Log files, extracted insightful information’s (Hive, PIG, Impala) and implement various machine learning algorithms in Spark.
- Helped with analyzing streaming data (Apache storm) and worked with chunk data.
- Involved in projects to improve customer sentiment and built models in R Shiny
- Explained the finding in an easy and understandable way to the business and marketing team.