Data Scientist Resume
SUMMARY
- Data scientist with strong quantitative and modeling skills, passionate about gaining actionable insights from data to derive value - adding technical recommendations for complex problems.
- Strong communication skills with ability to relay the insights to diverse audiences, good eye for detail, willing to undertake challenging roles.
- Adaptive to new technologies, affinity for critical thinking, and expertise in Statistical Analysis, Social Network Analysis, Machine Learning, and Data Analysis.
- 4 years of Professional experience and a proactive Data Scientist experienced at developing an end-to-end data ecosystem and producing data driven solutions to solve business problems.
- Proficiency in use of statistical tools and programming languages (Python, R, MATLAB, Java, C++, SQL, and Hadoop for Bigdata processing).
- Extensively involved in data preparation, exploratory analysis, feature engineering using supervised and unsupervised modeling.
- Well-versed with Linear/non-linear, regression and classification modeling predictive algorithms.
- Experience in working on cloud environments (Google Cloud Platform).
- Experience in building models with deep learning frameworks like TensorFlow and Keras.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Performed dimensionality reduction using principal component analysis and performed multiple Data Mining techniques to derive new insights from the data.
TECHNICAL SKILLS
Database: PostgreSQL, SQLServer, HDFS
Data Analytics: Google Analytics
Product Management: Qualtrics, SurveyMonkey
Project Management: JIRA, MS Project, Trello, SDLC
Programming: C++, Java
Testing: HP Quality Center, UserTesting.com
Scripting Languages: MATLAB, R, Python, HTML, CSS, JavaScript
Data Science and Modeling: Predictive / Prescriptive Analytics, Supervised / Unsupervised models, Statistical Methods (Bayesian and Frequentist), Random Forests, Decision Tree, Boosting, Ensemble learners, Logistic Regression, Linear Regression, Fully connected Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Unsupervised Deep Learning, Xgboost, TensorFlow, Numpy, Scipy, Matplotlib, pandas, flask, Django, BI, data analysis, manipulation, A/B testing, stakeholder management, big data
PROFESSIONAL EXPERIENCE
Confidential
Data Scientist
Responsibilities:
- Explored and built predictive models utilizing ensemble methods such as Gradient boosting trees using XGBoost and Neural Network built in Keras.
- Analysis was then performed on customer data already stored by Confidential ’s and supplemented with data acquired from various sources public and private databases for actionable result.
- Performed machine learning, and statistical analysis methods, such as clustering and classification, collaborative filtering, association rule mining, sentiment analysis, topic modeling, time-series analysis, multivariate regression analysis, statistical inference, and data validation methods.
- Outlined plan of execution to leverage predictive models and other machine learning algorithms as part of the data science support of corporate strategy.
- Addressed business questions by discovering relevant data, structuring it for analysis and database integration, and communicating the new opportunities and potential ROI to leadership.
- Worked on outlier detection with data visualizations using boxplots, feature engineering using Gaussian Mixture Models and K-NN distances built using Pandas, NumPy.
- Adopted feature engineering techniques with 200+ predictors in order to find the most important instances for the models such as sequential feature selection.
- Defined data pipeline (ingest / clean / munge / transform) for feature extraction toward downstream classification.
- Conducted analysis in assessing customer consuming behaviors and discover the value of customers with RFM analysis, applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Develop necessary connectors to plug machine learning software into wider data pipeline architectures.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Actionable insights were discovered by using time series analysis and linear regression models in conjunction with unsupervised customer segmentation methods like K-Means.
- Directed analysis of data and translated the derived insights into pricing strategies and actions.
- Developed a variety of models and algorithms for analytic purposes.
- These models were built in Python using the Scikit-Learn package and deployed in a docker container.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Analyzed large data sets and apply machine learning techniques and develop predictive models, statistical models.
- Used R and Python for Exploratory Data Analysis, A/B testing, ANOVA testing and Hypothesis test to compare and identify the effectiveness.
- Designing and developing various machine learning frameworks using Python, R, and Javascript.
Confidential
Data Scientist
Responsibilities:
- Fashioned surveys and analyzed trends to gain insight into the housing market behavior to increase leasing rates by 5% for a client.
- Designed device to collect data from real estate assets through sensors and constructed data pipeline using Dataflow to load data into Google Cloud Platform (GCP) BigQuery.
- Prepared SQL queries in BigQuery to retrieve collected time-series data for analysis and validation.
- Developed a predictive model to predict future consumption of assets by applying ARIMA and deep learning techniques, and deployed model into production to gain information about anomalies in the same.
- Designed and developed analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
- Implemented a dynamic programming approach in python for in house local search algorithms to improve the speed of implementation by 23%
- Designed and conducted A/B testing for pricing strategy of 2 products, which increased the conversion rate by 20%
- Used Python to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, and Random forest models.
- Tested classification methods such as Random Forest, Logistic Regression and Gradient Boosting; performed Cross Validation for hyper-parameter tuning to optimize the models for unseen data.
- Interacted with large relational databases using SQL for analysis of customer behavior
- Performed Data Cleaning, features scaling, features engineering using pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn packages in Python.
- Participated in product redesigns and enhancements to know how the changes will be tracked and suggest product direction based on data patterns.
- Dealt with millions of rows of data using SQL and performed Exploratory Data Analysis and defined KPI’s for products for over 10,000 users
- Involved in defining Data collection rules, Target data mappings, and data definitions.
- Collaborated with cross-functional engineering, data, marketing and 2 sales teams to organize the workflow process
Confidential
Data Science Research Assistant
Responsibilities:
- Built a Variational EM algorithm to fit the BHM on real social networks to observe the community structure and analyze how communities interact with each other.
- Analyzed evaluation techniques in dynamic link prediction scenario and proposed a metric that overcomes current metric shortcomings.
- Built Neural Networks, Random forests, SVM and Kalman Filters in Python and R for GPS/INS integration.
- Built predictive models for Radon Concentrations in Ohio, using Quantile Regression models in R
- Built conjugate gradient and Quasi Newton based MLP neural networks in MATLAB for estimation of Fourier transforms coefficients for modeling a compact impedance varying transmission line.