We provide IT Staff Augmentation Services!

Senior Data Scientist Resume

4.00/5 (Submit Your Rating)

San, FranciscO

SUMMARY:

  • Data scientist with strong quantitative and modeling skills, passionate about gaining actionable insights from data to derive value - adding technical recommendations for complex problems.
  • Strong communication skills with ability to relay the insights to diverse audiences, good eye for detail, willing to undertake challenging roles.
  • Adaptive to new technologies, affinity for critical thinking, and expertise in Statistical Analysis, Social Network Analysis, Machine Learning, and Data Analysis.
  • 8 years of Professional experience and a proactive Data Scientist experienced at developing an end-to-end data ecosystem and producing data driven solutions to solve business problems.
  • Proficiency in use of statistical tools and programming languages (Python, R, MATLAB, Java, C++, SQL).
  • Extensively involved in data preparation, exploratory analysis, feature engineering using supervised and unsupervised modeling.
  • Well-versed with Linear/non-linear, regression and classification modeling predictive algorithms.
  • Experience in working on cloud environments (GCP, Azure, AWS)
  • Extracted and analyzed Big Data using Hadoop, Hive, Pig, and Spark.
  • Experience in building models with deep learning frameworks like TensorFlow, PyTorch, Caffe, Mxnets and Keras.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Performed dimensionality reduction using principal component analysis, Multi-dimensional Scaling and other dimensionality reduction and feature selection techniques to derive new insights from the data.

TECHNICAL SKILLS:

Database: PostgreSQL, SQLServer, Hadoop, Spark, Hbase, Hive, Pig, Map Reduce

Data Analytics: Google Analytics, Tablaeu

Product Management: Qualtrics, SurveyMonkey

Project Management: JIRA, MS Project, Trello, SDLC

Programming: C++, Java

Platforms: Windows, Ubuntu, GCP, AWS, Azure

Scripting Languages: MATLAB, R, Python, HTML, CSS, JavaScript

Data Science and Modeling: Predictive / Prescriptive Analytics, Supervised / Unsupervised models, Statistical Methods (Bayesian and Frequentist), Random Forests, Decision Tree, Boosting, Ensemble learners, Logistic Regression, Linear Regression, Fully connected Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Unsupervised Deep Learning, Xgboost, jinja2, d3, TensorFlow, Mxnets, Pytorch, caffe, Numpy, Scipy, Matplotlib, pandas, flask, Django, power BI, data analysis, manipulation, A/B testing, stakeholder management, big data

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco

Senior Data Scientist

  • Explored and built predictive models for recommendation engine utilizing ensemble methods such as Gradient boosting trees using LGBoost.
  • PostgreSQL was used to handle customer data already stored in Macy’s databases, and Hadoop based Hive and Pig were used to acquire data from various sources in public and private databases and subsequently ingested into the Data Cleaning and Exploratory models.
  • Implemented CNNs in Tensorflow and Mxnet for automatic content labeling and identifying brands from logos.
  • Outlined plan of execution to leverage predictive models and other machine learning algorithms as part of the data science support of corporate strategy.
  • Addressed business questions by discovering relevant data, structuring it for analysis and database integration, and communicating the new opportunities and potential ROI to leadership.
  • Used the Twitter API to download tweets from shoppers and applied sentimental analysis to identify the drivers of likes and shares to improve the recommendation model.
  • Analyzed large data sets using Spark and applied machine learning techniques and develop predictive models, statistical models.
  • Worked on outlier detection with data visualizations using boxplots, feature engineering using Gaussian Mixture Models and K-NN distances built using Pandas, NumPy.
  • Adopted feature engineering techniques with 200+ predictors in order to find the most important instances for the models such as sequential feature selection.
  • Conducted analysis in assessing customer consuming behaviors and discover the value of customers with RFM analysis, applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Developed a predictive model to predict future popularity of products by applying ARIMA models, Elastic Nets, Variational Auto encoders, GAN’s and deployed models into production to gain information about the trends in product popularity among customers.
  • Directed analysis of data and translated the derived insights into pricing strategies and actions.
  • These models were built in Python using the Scikit-Learn package and deployed in a docker container.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.

Confidential, San Francisco

Data Scientist

  • Used R to analyze 20 million consumer data points to create a model that delivered 90K reward card upgrades
  • Built predictive models including support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate the likelihood of Customer segments being open to additional purchase opportunities
  • Designed and developed analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
  • Used Python to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, and Random forest models.
  • Tested classification methods such as Random Forest, Logistic Regression and Gradient Boosting; performed Cross Validation for hyper-parameter tuning to optimize the models for unseen data.
  • Interacted with large relational databases using SQL for analysis of customer behavior
  • Performed Data Cleaning, features scaling, features engineering using pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn packages in Python.
  • Participated in product redesigns and enhancements to know how the changes will be tracked and suggest product direction based on data patterns.
  • Involved in defining Data collection rules, Target data mappings, and data definitions.
  • Built Neural Networks, Random forests, SVM and Kalman Filters in Python and R for GPS/INS integration.
  • Built predictive models for Radon Concentrations in Ohio, using Quantile Regression models in R
  • Analyzed and found patterns and insights within structured and unstructured data using Machine Learning algorithms
  • Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback and retrained the model to improve the performance.

Confidential, San Antonio

Data Scientist

  • Query data/information from Database using SQL to support model validation
  • Build analytic tools to prepare data and graphs using Tableau
  • Draft the procedures for reporting
  • Write python scripts using spacy to parse documents
  • Maintain a webpage application built by Python libraries (flask, jinja2, d3 etc)
  • Performed Exploratory data analysis using pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn packages in Python.
  • Implemented word2eec model in tensorflow to identify the topics that were prompting users to contact customer service.
  • Explored various time series models such as ARIMA, and RNNs to identify trends in customer care traffic to better allocate resources.
  • Generated reports using tableau for managers to improve the customer service and to improve the website
  • Participated in product redesigns and enhancements to know how the changes will be tracked and suggest product direction based on data patterns.
  • Automated and deployed the end to end data collection to making NLU models available via REST API on Azure cloud environment
  • Automated generation of logs in JSON format for a machine learning application for debugging in real time

Confidential, Chicago

Data Engineer

  • Used Hadoop to process data relating to people on the web.
  • Identified and assessed available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Real-time API for Email Intelligence Metrics
  • Designed and built a RESTful API that serves demographics and other email/postal intelligence metrics to drive marketing solutions. System has processed as much as 5 billion requests per month.
  • Developed necessary connectors to plug machine learning software into wider data pipeline architectures.

We'd love your feedback!