We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY

  • 5+ years of Data science and analytics experience in entire data science project life cycle. Involved in all phases, including data extraction, data cleaning, statistical modeling and data visualization, with large datasets of structured and unstructured data.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, CART, SVM, LDA/QDA, Naive Bayes, Random Forest, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, Content based filtering and Neural Networks.
  • 1. Professional working experience with Python 3.X libraries including MatplotLib, Numpy, Scipy, Pandas, Beautiful Soup, Scikit-learn and NLTK for analysis purpose.
  • 2. Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 / 2.X (Jupyter Notebook, Spyder), R 2.15 / 3.0 (Reshape, ggplot2), SAS 9.3, Matlab 8.0 and Excel 2010/2013.
  • 3. Experience with data visualizations using Python 2.X / 3.X and R 2.15 / 3.0 and generating dashboard with Tableau 8.0 / 9.2 / 10.0 and Cognos Insight, Oracle and Spark.
  • 4.working experience in big data environmentlikeHadoop Ecosystem 1.X / 2.XincludingHDFS, MapReduce, Hive 0.11, HBase 0.9
  • Excellent analytical, problem solving and interpersonal skills. Ability to learn new concepts fast. Consistent team player with excellent communication skills.

TECHNICAL SKILLS

Machine Learning Algorithms: Analytic Tools Linear regression, SVM, Naive Bayes, Anaconda 4.0 / 2.X (Jupyter Notebook, Logistic regression, LDA/QDA, SVM, CART, Spyder), R 2.15 / 3.0 (Reshape, ggplot2, Random Forest, Boosting, K-means clustering, SAS 9.3, Matlab, Hierarchical clustering, Collaborative Filtering, Neural Network and Excel

Statistical Analysis Programming Language: Hypothesis Test, ANOVA, Survival Analysis, Python 2.X & 3.X (numpy, scipy, pandas, Longitudinal Analysis, beautiful soup, scikit-learn, NLTK), Hadoop Ecosystem (1.X & 2.X) Spark Framework (1.4 & 1.6& 2.0) HDFS, MapReduce, Hive 0.11, Hbase 0.9SparkSQL, Pyspark, Mllib

Relational Database: Data Visualization MySQL 5.0, Oracle 11g / 12c, MS SQL Tableau 8.0 /9.2 / 10.0 , 2012R-ggplot2, Python-Matplotlib

NoSQL: Version Control MongoDB 3.3 / 3.4Git 2.X

Operation System: Windows 7 / 10, Mac OS

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential, Cincinnati, OH

Responsibilities:

  • Patient admission forecast: Using Python and Matlab, predicted that a patient will be re-admitted to the hospital or not. Created test and validation data set from the given dataset. Built models and compare prediction results by using Logistic Regression, Neural Network, Decision Tree, Random forest
  • Predicted customer base for organic products: Using Python, predicted the customer base for the sale of newly launched organic products for a super market. Built models and compare prediction results by using Logistic Regression, Neural Network, Random forest
  • Book recommendation system: Using Python, implemented a simple recommender system using a book rating data set.

Environment: Python 3.X, Oracle 12c, MongoDB 3.3, Spark 1.6 / 2.0 (Pyspark, MLlib, Spark SQL) Excel 2013, MS SQL Server 2012, R 2.15, Tableau 8.1, R 3.X, Excel 2010, Powerpoint 2010, Outlook, Cognos Insight, SAS and SAS Miner

Data Analytics

Confidential

Responsibilities:

  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle 12c.
  • Collected unstructured data from MongoDB 3.3 and completed data aggregation.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using Python 3.5.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Developed personalized programs recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Used Python 3.X and Spark 1.4 (PySpark, MLlib) to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
  • Evaluated and optimized performance of models, tuned parameters with K-Fold Cross Validation.
  • Utilized SQL to extract data from SQL Server 11.0 and prepared data for analysis.
  • Assisted to improve the liquidity of our ads model.
  • Based on the data of clients and traffic, designed comprehensive analysis to optimize programs and explored the strengths and weaknesses of programs.

Environment: Python 3.X, Oracle 12c, MongoDB 3.3, Tableau 8.1, R 2.15/3.0, Spark 1.6 / 2.0 (Pyspark, MLlib, Spark SQL)

Data Analyst

Confidential

Responsibilities:

  • Used Cognos Insight and Excel 2010 to analysis arbitraging strategies, including figuring out the trends and patterns of data and calculating the non-arbitrage interval.
  • Attended seminars to gain more deeper understanding of the industry of multi-level marketing (MLM).
  • Analyzed the size of MLM Market, customer recognition and competitors key operating data.
  • Utilized SQL to extract data from SQL Server 11.0 and prepared data for analysis.
  • Acquired and tracked users’ view and purchasing activities in MLM environment to improve recommendation efficiency
  • Designed dashboards with Tableau 9.2 and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.
  • Used SAS and SAS Miner to implement different machine learning algorithms including Logistic Regression, Decision Tree, SVM, Random Forest, Boosting and Neural Network.

Environment: Excel 2013, MS SQL Server 2012, Tableau 8.1, Excel 2010, PowerPoint 2010, Outlook, Cognos Insight, SAS and SAS Miner

We'd love your feedback!