Data Scientist Resume
4.00/5 (Submit Your Rating)
Sunnyvale, CA
PROFESSIONAL SUMMARY:
- Having 7 years of IT experience as an operations research analyst and data scientist,
- Strong background and experience in data science/ data analysis, machine learning, probability and statistics theory,
- Expert in mining, wrangling and analyzing complex, high - volume and high-dimensional data from varying resources,
- Strong coding skills, performing data cleaning and data transformation activities using Python and R,
- Experience with Big Data ecosystems,
- Extensively involved in data preparation, exploratory data analysis, feature engineering using supervised/unsupervised modeling,
- Solid grasp of Linear/Non-linear regression and classification modeling and predictive algorithms in machine learning,
- Proficient in Regression, Classification and Clustering analysis by using machine learning algorithms,
- Experienced Creating / Modeling Machine Learning systems with Neural Networks and Convolutional Neural Networks with TensorFlow and Keras
- Highly motivated and self-starter with TEMPeffective communication and organizational skills,
TECHNICAL SKILLS:
Programming: Python, R, MySQL, PostgreSQL
Tools: NumPy, Pandas, SciPy, Sci-kitLearn, Pyspark, Statsmodel, TensorFlow, Keras, NLTK, Seaborn, Matplotlib, Plotly, Cufflinks, Choropleth, SQLAlchemny, PySpark, Hadoop, HDFS
Artificial Intelligence: Statistical Machine Learning Algorithms, Linear and Logistic Regression, Decision Tree, K-Means, k-NN, Support Vector Machine (SVM), Random Forest, XGBoost, CNN, OpenCV, YOLO, Faster R-CNN, RNN, LSTM, NLTK, Spacy
PROFESSIONAL EXPERIENCE:
Confidential
- Built a CNN to classify images from scratch by using Keras and Tensorflow,
- Used batch normalization method to develop more efficient model,
- Applied data augmentation techniques to increase training data and create more robust CNN model,
- Used dropout, L2 regularization to overcome over-fitting problem,
- Performed transfer learning, using inception, vgg19, mobilenet and resnet50 model and compared teh model accuracy,
- Developing YOLO, Mask R-CNN algorithms to object detection.
Data Scientist
Confidential, Sunnyvale, CA
- Performed data wrangling, cleaning and preprocessing methods to prepare data for analysis,
- Created new features to build more robust models,
- Analyzed iCloud upload and download performances for photos, drive and backup services for various countries using problem-specific metrics,
- Developed K-Modes, Gaussian Mixture Models (GMM) and Agglomerative Hierarchical Clustering models in order to group countries which has same pattern in regards to iCloud performance,
- Analyzed iCloud customer’s behavior patterns and relationship between iCloud performance and customer engagement and also iCloud performance and revenue.
Confidential
- Analyzed Confidential iCloud data using big data ecosystems such as Hadoop, Pyspark,
- Performed data wrangling, cleaning and preprocessing methods to prepare data for analysis,
- Explored new features by using feature engineering techniques,
- Applied Kruskal-Wallis, Confidential, Chi-square statistical test to detect important features on iCloud Performances,
- Prepared python scripts for iCloud photos, drive and backup services to deploy on Hadoop environment,
- Coordinated with Dev-Ops team to automate statistical analysis daily base,
- Triggered daily emails about iCloud performance in photos, backup, and drive services.
Data Scientist
Confidential, San Jose, CA
- Developed machine learning and deep learning models to provide industrial solutions,
- Organized and conducted machine learning pipeline, with a focus on data preparation, model training and optimization, successfully used visualization techniques to inform and give intuitions to stakeholders,
- Performed exploratory analysis and feature engineering to fit best models on Python,
- Discovered patterns, formulated and tested hypotheses, translated results into strategies which drive growth resulting in increased revenues and improved customer satisfaction,
- Consulting companies and individuals for their data science problems,
Confidential
- Explored teh aspects dat influence teh forecast of client subscription in depositing,
- Feature engineered, handled missing values and created new features by combining existing ones,
- Utilized Python matplotlib and seaborn libraries for visualization and exploratory data analysis,
- Applied Confidential, chi-square statistical tests to determine teh predictive power and association amid teh features,
- Selected teh most significant features to decrease complexity and processing time and increase robustness of machine learning models,
- Created different models including Regression, Random Forest, XGBoost, SVM, and Neural Networks algorithms; compared model performances, and adjusted hyper-parameters to obtain improved outcomes.
Confidential
- Preprocessed data by transforming feature, dealing with missing values, feature engineering, creating dummy variables,
- Explored and visualized data set by using seaborn and matplotlib libraries,
- Developed models with data analysis algorithms such as Logistic Regression, Support Vector Machine, kNN, Naïve Bayes Classifier, Decision Trees, Random Forest and Neural Network,
- Applied Grid Search method for parametric algorithms to find optimum parameters and compared algorithms’performances,
- Ranked top 7% in teh competition.
Confidential
- Analyzed pattern between violence and other features such as gender, ethnicity, religion, age etc...,
- Used categorical statistical techniques, like Chi-Sq test, CramersV, Confidential test, t-test to unveil relation between variables,
- Applied advanced visualization techniques by using RShine,
- Performed spatial analysis on data by using map applications,
- Due to not finding pattern, used permutation test to increase sample size.