Data Scientist Resume

SUMMARY

Around 5 years of extensive experience in Data science, concentrated in dealing with data using Python, R and SQL
Experienced Data Scientist with expertise in Data Mining, Data Cleaning, Exploring and Visualizing Data, building and evaluating statistical models, preparing dashboard and ultimately implementing best suitable machine learning models to make strong decisions
Hands on experience using Python 3.x undertaking data analytics and visualizations using various core analytical Python libraries such as Numpy, Scipy, Pandas and Scikit - learn
Well-versed in data-wrangling, loading and working efficiently through SQL server and writing complex queries
Strong Knowledge in Statistical methodologies such as Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), and Time Series Analysis
Strong ability to use optimization Techniques like Gradient Descent, Stochastic Gradient Descent for regression models
Profound knowledge of various Supervised and Unsupervised machine learning algorithms such as Ensemble Methods, Clustering algorithms, Classification algorithms and Time Series models (AR, MA, ARMA, ARIMA)
Exposure to text mining using NLP along with data cleaning using Natural Language Toolkit (NLTK)
Expertise on using data transformation techniques such as log, root-mean-square, skewness, normalization, aggregation of the data
In-depth Knowledge of Dimensionality Reduction, Hyper-parameter tuning, Model Regularization (Ridge, Lasso) and Grid Search techniques to optimize model performance
Proficient in Creating reports/ dashboards using Tableau by using advance functionalities such as animations, interactive charts etc.
Expertise in developing analytical approaches to meet business requirements
Certified in Dell EMC Data Science Associate (EMCDSA)

TECHNICAL SKILLS

Machine Learning: Classification, Regression, Feature Engineering, One hot coding, Clustering, Regression analysis, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, KNN, Ensemble Methods, K-Means Clustering, Time Series Analysis, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction

Statistical Analysis: Hypothesis testing, A/B analysis, ANOVA, MANOVA, normal distribution, mean, median, mode, standard deviation, regressions, f-test

Exploratory Data Analysis: Univariate/Multivariate analysis, Outlier detection, data cleansing, Missing value imputation, normalization, linearity assumption, Histograms/Density estimation, data visualization using libraries like yellowbrick, seaborn from Python

Programming Languages: Python (pandas, seaborn, numpy, Scikit-learn), R (dplyr, ggplot2), SQL (complex queries using window functions and aggregation)

Big Data: Hadoop Ecosystem - Hive, Pig, MapReduce, PySpark

Databases: MS SQL Server, Oracle, PostgreSQL, HDFS, HBase

Data Visualization Tools: Tableau 9.x/10.x, Seaborn/ Matplotlib, Plotly, SSRS, Shiny, ggplot2

Cloud Data Systems: AWS (RedShift), AWS S3

IDE: Jupyter-lab, R Studio, Eclipse, Spyder, Atom, Notepad++, Pycharm

Version Control Tools: Git, GitHub

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Cleaned and manipulated complex datasets to create the data foundation for further analysis and the development of key insights (MS SQL server, R, Tableau, Excel)
Incorporated various machine learning algorithms and advanced statistical analysis like decision trees, regression models, SVM, clustering using scikit-learn package in Python
Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python
Performed Exploratory Data Analysis (EDA) to visualize through various plots and graphs using matplotlib and seaborn library of python, and to understand and discover the patterns on the Data, understanding correlation in the features using heatmap, performed hypothesis testing to check significance of the features
Developed analytical approaches to answer high-level questions and provided insightful recommendations
Conducted various statistical analysis like linear regression, ANOVA and classification models to the data for analysis
Involved in extracting customer's Big Data from various data sources (Excel, Flat Files, Oracle, SQL Server, MongoDB, Teradata, and also log data from servers) into Hadoop HDFS
Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn in Python for developing various machine learning algorithms
Assured data quality and data integrity, and optimized data collection procedures on a weekly and monthly basis
Created Data Quality Scripts using SQL and Hive to validate successful data load and assured the quality of data.
Worked on different data formats such as JSON, XML, CSV, .dat and exported the data into data visualization/ ETL platform
Evaluated model performance using techniques like R square, adjusted R square, confusion matrix, AUC, ROC curve, Root mean squared error etc.
Incorporated, Developed and applied metrics and prototypes that could be used to drive business decisions
Participated in ongoing research, and evaluation of new technologies and analytical solutions to optimize the model performance
Used problem-solving skills to find and correct the data problems, applied statistical methods to adjust and project results when necessary
Worked across cross-functional teams to understand the data requirements and provided the detailed analytical reports to accomplish the business decisions

Environment: Model: Logistic Regression, Decision Trees, Random Forest, SVM

Evaluation Metrics: Confusion Matrix, Recall, Precision, ROC, Cross Validation

Technologies: Python, MySQL, Machine Learning Libraries - NumPy, Pandas, Scikit, Stats-model

Confidential

Data Scientist

Responsibilities:

Participated in all phases of data mining, data cleaning, data collection, developing models, validation, and visualization
Accomplished data pipeline process, collected required data from different sources and converted into structured form using SQL
Hands on experience in Working with the large volume of data with more than 10M customer records with 20+ features
Incorporated Exploratory Data Analysis to identify the correlation between variables, multicollinearity, and hidden patterns, trends and seasonality using Numpy and Pandas libraries to perform data analysis
Analyzed customer churn distribution over the different attributes such as tenure, different services a customer has signed up for and demographic data using visualization libraries matplotlib and seaborn in python
Performed PCA, backward feature selection, correlation analysis for Dimensionality Reduction of the data to achieve the accuracy in the results
Performed SMOTE analysis to deal with unbalanced distribution of the data over the training data set
Implemented various classification models such as Logistic Regression, Decision Trees, Random Forest, KNN, XGBoost, and SVM and applied most efficient algorithm to predict the results
Performed K-Fold cross-validation to test models with different batches to optimize the model and prevent overfitting
Participated in feature engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing
Achieved 90% customer monthly retention by predicting the likelihood of returning customers using a logistic regression model in R
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
Improved data cleansing and mining process based on R and SQL, resulting in a 50% of time reduction
Conducted analysis and pattern on customer’s needs in different location, different categories and different months by using time series modeling techniques

Environment: Python 3.x, (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), Tableau, Machine Learning algorithms (KNN, Decision Tree, Random Forest, Logistic Regression, Support Vector Machines, XGBoost), Microsoft Excel, MySQL

Confidential, MA

Data Scientist

Responsibilities:

Performed statistical analysis such as ANOVA, linear regression, logistic regression on data using Python and Advance MS Excel
Collected data from end client, performed ETL using various platforms and defined the uniform standard format
Wrote queries to retrieve data from SQL Server database to get the sample dataset containing basic fields
Used Ensemble learning methods like Random Forest, Bagging & Gradient Boosting and picked the final model based on confusion matrix, ROC & AUC & predicted the probability of customer enrolment
Tuned the hyper parameters of the above models using Grid Search to find the optimum models
Designed and implemented K-Fold Cross-validation to test and verify the model's significance
Developed a dashboard and story in Tableau showing the benchmarks and summary of model's measure
Created database Tables, Constraints, Views, Synonyms, Sequences, Indexes and Object Types
Created Entity Relation Diagrams for coding integrations for the database using Microsoft Visio routines.
Explored sales of different categories of customers with the help of graphs using Matplotlib and Seaborn and analysed whether sales have trending or seasonality pattern.
Experimented and built predictive models such as Linear Regression, AR, MA, ARMA, and ARIMA to predict sales amount using Scikit-learn and Stats-model library.
Developed and modified database procedures, triggers to enhance and improved functionality using T-SQL

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship