We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Overland Park, KS

SUMMARY:

  • Professional qualified Data Scientist with over 5.5years of experience on Data Science and Analytics in Banking, Insurance and Telecom Domain.
  • Designing and developing various machine learning frameworks using Python, R, and MATLAB.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Rich Experience in managing entire data science project life cycle and involved in all phases, including data extraction, data cleaning, statistical modeling and data visualization, with large datasets of structured and unstructured data.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, Neural Networks and NLP
  • Professional working experience with Python 2.X / 3.X libraries including MatplotLib, Numpy, Scipy, Pandas, Beautiful Soup, Seaborn, Scikit-learn and NLTK for analysis purpose.
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 / 2.X (Jupyter Notebook, Spyder), R 2.15 / 3.0 (Reshape, ggplot2, Dlpr, Car, Mass and Lme4), SAS 9.3, Matlab 8.0 and Excel 2010/2013.
  • Experience with data visualizations using Python 2.X / 3.X and R 2.15 / 3.0 and generating dashboard with Tableau 8.0 / 9.2 / 10.0.
  • Working experience in Statistical Analysis and Testing including Hypothesis test, Anova, Survival Analysis, Longitudinal Analysis, Experiment Design and Sample Determination and A/B test.
  • Hands-on experience in importing and exporting data using Relational Database including Oracle 11g / 12c, MySQL 5.0 and MS SQL Server 2008 / 2012, and NoSQL database like MongoDB 3.3 / 3.4.
  • Working experience in big data environment like Hadoop Ecosystem 1.X / 2.X including HDFS, MapReduce, Hive 0.11, HBase 0.9 , Spark Framework 1.4 / 1.6 / 2.0 including Pyspark, MLlib and SparkSQL
  • Working experience in version control tools such as Git 2.X to coordinate work on file with multiple team members.
  • Employing various SDLC methodologies such as Agile and SCRUM methodologies.
  • Good team player and quick-learner; highly self-motivated person with good communication and interpersonal skills.

TECHNICAL PROFICIENCY:

Machine Learning Algorithms: Analytic Tools Linear regression, SVM, KNN, Naive Bayes, Anaconda 4.0 / 2.X (Jupyter NotebookLogistic regression, LDA/QDA, SVM, CART, Spyder), R 2.15 / 3.0 (Reshape, ggplot2Random Forest, Boosting, K-means clustering, Dlpr, Car, Mass and Lme4), SAS 9.3, Matlab Hierarchical clustering, Collaborative filtering, 8.0, Mathematica 9.0, Excel 2010 / 2013 Neural Network, NLP.

Statistical Analysis Programming Language: Hypothesis Test, ANOVA, Survival Analysis, Python 2.X & 3.X (numpy, scipy, pandasLongitudinal Analysis, Experiment Design and seaborn, beautiful soup, scikit-learn, NLTK)Sample Determination, A/B TestSQL, C

Hadoop Ecosystem (1.X & 2.X): Spark Framework (1.4 & 1.6& 2.0) HDFS, MapReduce, Hive 0.11, HBase 0.9SparkSQL, Pyspark, Mllib

Relational Database: Data Visualization MySQL 5.0, Oracle 11g / 12c, MS SQL Tableau 8.0 /9.2 / 10.0 , D3.js 3.X / 4.XServer 2008 / 2012R-ggplot2, Python-Matplotlib

NoSQL: Version Control MongoDB 3.3 / 3.4Git 2.X

Operation System: Windows 7 / 10, Mac OS

Programming Languages: Python, R, Matlab, SQL, UNIX, MongoDB, Spark, Hadoop, Lua, Torch, Tensor flow.

Machine Learning and Deep Learning Techniques: Trees, Bayes Model, SVM, Ensemble Methods, Neural Networks, RNN, KNN, CNN, MLP, Ensemble SVM, Majority voting, Linear models, Classification, Regression, Logistic Regression, Clustering, Kernel methods, Memory Networks, LSTMs, Dimension reduction, Deep belief networks, Statistical tests, DBN, Gaussian mixtures, DCN.

Python Libraries: Scikit, pandas, Numpy, Scipy, Theano, Keras, Matplotlib, pymongo.

R Libraries: dplyr, ggplot2, jsonlite, plyr, rvest, rjson, httr, xml2, curl

PROFESSIONAL EXPERIENCE:

Confidential, Overland Park, KS

Data Scientist

Responsibilities:

  • Extracting data from various databases by implementing ETL process, wrote and optimized SQL queries to perform data extraction.
  • Handled over 120 million records in data preparation and training the model.
  • Designing and developing various machine learning frameworks using R.
  • Provided bi-weekly reviews to director and VP of Confidential .
  • Implemented Dynamic time wrapping for time series classification.
  • Implemented SVM - One class for novelty detection.
  • Handled imbalance datasets with multiclass classification.

Environment: R 3.X, Oracle 12c, Spark 1.6 / 2.0 (Pyspark, MLlib, Spark SQL), 3.X / 4.X, Git 2.X

Confidential, Wichita, KS

Data Scientist

Responsibilities:

  • Designing and developing various machine learning frameworks using python, R, and MATLAB.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collected unstructured data from MongoDB 3.3 and completed data aggregation.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Developed personalized products recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Performed data visualization with Tableau10.0, MeteorJS and generated dashboards to present the findings.
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Determined customer satisfaction and helped enhance customer experience using NLP.
  • Used Git 2.X to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.

Environment: R 3.X, Oracle 12c, MongoDB 3.3, Spark 1.6 / 2.0 (Pyspark, MLlib, Spark SQL), Tableau 10.0, D3.js 3.X / 4.X, Git 2.X

Confidential, Wichita, Kansas

Data Scientist

Responsibilities:

  • Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
  • Predicted the claim severity to understand future loss and ranked importance of features.
  • Used R 3.X, R2.X and Spark 1.4 (PySpark, MLlib) to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
  • Evaluated and optimized performance of models, tuned parameters with K-Fold Cross Validation.
  • Provided analytical support to underwriting and pricing by preparing and analyzing data to be used in auctorial calculations
  • Designed dashboards with Tableau 9.2 and MeteorJS provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.
  • Identified process improvements that significantly reduce workloads or improve quality.
  • Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle 10g and HDFS, while maintaining data integrity.
  • Worked on data cleaning, data preparation and feature engineering with Python 3.X including Numpy, Scipy, Pandas, Matplotlib, Seaborn and Scikit-learn.
  • Worked with the version control tools, such as Git 2.X, to keep versions attributed from different people and record project at different time points.
  • A Novel Machine Learning Framework for Phenotype Prediction Based On Genome-Wide DNA Methylation Data. Published IJCNN 2017
  • Recognizing handwritten digits using artificial neural network
  • Implemented a project using python, skit-learn, Numpy, Scipy and UNIX for detecting handwritten digits using artificial neural networks (Natural Language Processing) and convolutional neural networks.
  • Library Management system
  • Developed a database using JAVA, SQL, and UNIX for managing a library, with one user as a librarian with limited access and the other as a database administrator.

Environment: R 3.X, R 2.X, Oracle 10g, Hive 0.11, HDFS, Spark 1.4, Tableau 9.2, Git 2.X

Confidential, Wichita, Kansas

Data Scientist

Responsibilities:

  • Prediction of Multiple Regression analysis.
  • Designed and developed an advanced recommendation system for real estate and finance customers utilizing R and Python.
  • Utilized SQL to extract data from SQL Server 11.0, and MongoDB, to prepare data for analysis. classified customers by RFM analysis, clustering and regression model with R 3.0, selected customers with high value and improved their retention rate by sending ads and coupons

Environment: Excel 2013, R 3.0, Hadoop 1.X, MS SQL Server 2012, Tableau 8.1

Confidential

Data Scientist

Responsibilities:

  • Used Python 2.7 to apply time series models, clustering algorithm and other data mining methods to explore the fast growth opportunities of our clients
  • Analyzed the traffic queries of Baidu search engine using classification algorithm.
  • Assisted to improve the liquidity of our ads model.
  • Based on the data of clients and traffic, designed comprehensive analysis to optimize products and explored the strengths and weaknesses of products.
  • Team Member, Rating Engine Application
  • Boosted marketing for client by designing a cell phone data rating engine in Python, PL/SQL, and UNIX/Linux to suggest suitable cell phone plans based on data usage.
  • Team Lead, Call Center Application
  • Increased efficiency of client customer support by implementing a call center application using R, Python and SQL that tracked customer complaint history and assigned defects accordingly.
  • Team Member, Open Reach
  • Enhanced customer buying experience using Machine learning, R, python and SQL
  • Team Member/SPOC R50 Release, One Siebel
  • Ensured smooth access for customers by developing scripts in Python and Unix for managing the crash reports and log levels of Unix and Windows servers, maintaining server network configuration components using Siebel 8.1 and Oracle, analyzing and providing RCA for server defects, and developing and publishing documents for avoiding defects in future releases; set up Scrum daily between Delivery Managers, Dev Team, Test Team, and Admin Team.

Environment: Excel 2010, Python 2.7, Hadoop 1.X, MapReduce, MS SQL Server 2008, Tableau 8.0

We'd love your feedback!