We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

New, YorK

SUMMARY

  • Data scientist with 6 years of experience in transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
  • Sound financial domain knowledge of Fixed Income, Bonds, Equities, Trade Cycle, Derivatives (Options and Futures), Portfolio Management, Sales and Marketing, CCAR and risk management.
  • Involved in the entire data science project life cycle, including Data Acquisition, Data Cleaning, Data Manipulation, Data Mining, Machine Learning Algorithms, Data Validation, and Data Visualization from structured and unstructured Data Sources.
  • In - depth knowledge with machine learning supervised and unsupervised or semi-supervised algorithms such as Linear Regression, Logistic Regression, Decision Tree(CART), Random Forest, Adaboost, Xgbsoost, SVM, KNN, Naïve Bayes, Bayesian Network, Clustering, PCA, Neural Networks, Recommendation System and more.
  • Expertise in Statistics methodologies such as hypothesis testing, A/B testing, ANOVA, Monte Carlo simulation, time series analysis.
  • In-depth knowledge in the latest trends in deep learning that has been used in NLP and apply it to projects. Techniques includes both traditional approaches (RNN, LSTM, word2vec, SVD, etc.) and novel methods (GRU, GloVe).
  • Worked with model validation and optimization using k-fold cross validation and regularization.
  • Proven ability in text mining and sentiment analysis using machine learning algorithms.
  • Proficient in Python 3.6 with Numpy, Pandas, Scipy, Scikit-learn and matplotlib packages.
  • Hands-on experience in R 3, SAS 9.4, Javascript, d3.js, HTML and CSS.
  • Experienced in SAS/BASE, SAS/MACRO, SAS/SQL in Windows and Unix environments
  • Skilled in using SAS Statistical procedures like PROC REPORT, PROC TABULATE, PROC CORR, PROC ANOVA, PROC LOGISTI, PROC FREQ, PROC MEANS, PROC UNIVARIATE.
  • Strong skills in web analytics tools like Google Analytics and Google Adwords.
  • Solid ability to write and optimize diverse SQL queries with deep knowledge in RDBMS PostgreSQL and NoSQL Database, such as PostgreSQL, MySQL 5.5, SQL Server 2014, HBase 1.2 and MongoDB 3.2, Cassandra.
  • Working experience in Hadoop 2.0, Hive and Hbase.
  • Hands on experience of Data Science libraries in Python such as Pandas, Numpy, SciPy, scikit-learn, Matplotlib, Seaborn, Beautiful Soup, Orange, Rpy2, LibSVM, neurolab, NLTK.
  • Familiar with packages in R such as ggplot2, Caret, Glmnet, Dplyr, Tidyr, Wordcloud, Stringr, e1071, MASS, Rjson, Plyr, FactoMineR, MDP.
  • Working knowledge of NLP based deep learning models in Python 3.
  • Experience in data visualizations using Python, R, D3.js and Tableau 9.4/9.2.
  • Deep understanding of building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau 9.4/9.2.
  • Employed machine learning approaches in different topics such as Anomaly detection, text analysis, recommendation system and search ads click-through rate(CTR) prediction.
  • Excellent understanding of SDLC (systems development life cycle), methodologies such as Agile and Scrum.
  • Extensive experience with version control tool Git.
  • Effective team player with strong communication and interpersonal skills, possess a strong ability to adapt and learn new technologies and new business lines rapidly.
  • Extensive experience in handling multiple tasks to meet deadlines and creating deliverables in fast-paced environments and interacting with business and end users.

TECHNICAL SKILLS

Hadoop Ecosystem\Data Analysis/Visualization\: Hadoop2.X, Spark 2.1(SparkSql, Spark MLlib), \ Tableau 9.4/9.2, Matplotlib, D3.js, Rshiny\Hive2.1, Hbase1.0+, PySpark, Scala 2.10.X, \H2O\

Languages\Database\: Python 3.6/2.7, R 3, Java 9, Scala 2.12, \ MySQL 5.X, PostgreSQL 5.5, Oracle 11g, Access SAS 9.4, VBA, SQL, HiveQL\ MongoDB 3.2, Cassandra 3.0, HBase 1.0+\

BI Tools \Operating Systems \: Tableau 9.4/9.2, Power BI 2.X, \ Mac OS, Windows 10/8/7, UNIX, Linux, Ubuntu\XSharePoint 2016/2013 \

Packages\Machine Learning\: Python (Numpy, Pandas, Scikit-learn, Scipy and \Linear Regression, Logistic Regression, Decision \Matplotlib, NLTK), H2O, R (caret, dplyr, \ trees, LDA, Naive Bayes, Neural Networks, SVM,\Glmnet, lavaan, bnlearn, ggplot2, GoogleVis, \ XGBoost, Random Forests, Bagging, GBM, PCA\Shiny \

Bussness Analysis\Documentation/Modeling Tools\: Requirements Engineering, Business Process \MS Office 2010, MS Project, MS Visio, \Modeling & Improvement, Gap analysis, \Rational Rose, Excel (Pivot Tables, \Cause and Effect Analysis, UI Design, UML \ VLookups) Share Point, Rational Requisite Pro, \Modeling, User Acceptance Testing (UAT), \ MS Word, PowerPoint, Outlook\RACI Chart, Financial Modeling\

Version Control\: Git, TFVC\

PROFESSIONAL EXPERIENCE

Confidential, New York

Data Scientist

Responsibilities:

  • Extracted data from Amazon Redshift database using SparkSQL
  • Participated in all phases of data mining, data collection, data matching, data cleaning, developing models, validation and visualization in Python
  • Built ads and digital campaigns Click-Through Rate(CTR) predictive models using a stacked ensemble method including random forest, xgboost and factorization machine
  • Utilized field-aware factorization machines(FFM), hash function and NLP to do feature engineering
  • Measured and reported features (A/B testing) based on propensity score matching
  • Implemented cross validation to improve model performance
  • Utilized reporting tools to deliver bi-weekly analytical reports on social media campaigns through Google Analytics
  • Performed touchpoints-based customer journey with DMP’s e-commerce and web browsing data using PostgreSQL
  • Developed clustering models based on demographics, purchase behavior and psychographics to do customer segments with Confidential DMP using PostgreSQL and Python
  • Drew dynamic maps, scross-visitation tables and customer pathways in Tableau, Python Matplotlib and D3.js

Environment: PostgreSQL 5.5, Python 3.6(Pandas, Pandas.io.sql, Numpy, Matplotlib, Psycopg2, re, NLTK, Scikit-learn, PySpark), Supervised Learning Models (Logistic Regression, Tree Methods, SVM, Xgboost, NLP), Unsupervised methods (Clustering, PCA), Microsoft Office 2017 (PowerPoint/ Word/Excel), Tableau9.4, D3.js

Confidential, New York

Data Scientist

Responsibilities:

  • Built pipeline for data processing, feature engineering, and model training of a gradient boosted machine using Python to predict the likelihood of a customer changing his/her address based on existing transaction and account activity
  • Used NLP techniques to create features that leveraged merchant names
  • Analyzed customers' digital engagement patterns (web, mobile, mobile web) to identify reasons for digital engagement and potential pain points for customers while performing account management tasks
  • Explored various target definitions to determine the optimal window to identify address changes
  • Built pipeline for data processing, feature engineering, and model training of a gradient boosted machine using Python to predict the likelihood of a customer going over limit on their next transaction
  • Built features and model to detect expressions of dissatisfaction in customer call transcripts using TF-IDF vectorization and time position features
  • Tested a variety of algorithms including logistic regression and tree-based methods
  • Produced production-quality feature generation code that is currently being used on incoming calls
  • Delivered presentations & prepare walk-around deck explaining model build methodology to non-technical audiences.

Environment: Python 2.6(pandas, numpy, scikit-learn, xgboost, gbm, nltk, string, seaborn, TfidfVectorizer, CountVectorizer), Hadoop, Microsoft Office 2013 (Word/PowerPoint/Excel), Power BI

Confidential

Data Analyst

Responsibilities:

  • Pulled data from both SQL databases(MySQL) and NoSQL database(MongoDB)
  • Preprocessed structured and unstructured e-commerce purchase and customer reviews, calls data using R and Python
  • Build reports and dashboards based on digital touch point KPI metrics
  • Conducted linear/logistic regression, decision trees with Python to predict customers’ purchase, rating behavior
  • Interacted with department heads to finalize business requirements functional requirements
  • Gathering data information from multiple sources using analytical techniques, and presenting data that visually communicates to supervisor about the important aspects of the data to optimize the flow of information
  • Worked towards establishing best-practice to evaluate and improve the data quality for analytical use
  • Built advanced analytics models to identify significant features from the finalized datasets
  • Liaised with the other client-facing teams to help the client understand nuances of the model details
  • Utilized data mining and statistical analysis skills for proposing strategy to acquire future projects.

Environment: MySQL 5.5, MongoDB, Python 2.6, R 3, MS Excel 2013, MS PowerPoint 2013

Confidential

Consultant

Responsibilities:

  • Developed successful momentum and mean-reversion alphas on the 3000 largest U.S.-traded stocks by using price volume data in Python
  • Awarded Confidential Websim Contest 2014 Silver Medal and $400 stipend and received an offer to join the WQ Beijing Office as a part-time research consultant for one year
  • As a member of a small team, design, develop and implement an innovative quantitative methodology for firm wide portfolio optimization
  • Help preprocess messy unstructured stock data and other transaction data in SQL and update stock info database weekly
  • Helped senior quants develop methods and tools to evaluate and optimize the firm's trading strategies and trading signals and report results to them
  • Performed analyses on the firm's historical trading to improve profitability
  • Take new ideas, methods, or models and implement them efficiently in code.

Environment: Python, MySQL 5.5, MS Office 2013 (PowerPoint/Word/Excel)

Confidential

Data Analyst

Responsibilities:

  • Analyzed B2B/B2C platforms to collect potential customer data and stored them in MySQL database
  • Filtered raw data from providers to position potential customers and improved the filtering process by using search engine and greatly raised work efficiency
  • Coordinated activities between the business house and technical staff, in developing new methods, policies, and procedures to meet the business needs
  • Dived on key business, product, financial, merchant and consumer metrics to develop insights and strategy
  • Created data for the most likely events (e.g. bill surcharge, late fee etc.) that could affect predictions
  • Successfully helped managers recommend accurate strategic plans and implement marketing and business decisions. Utilized skills in software applications such as SAS Enterprise Miner/Guide to help managers salidate strategy using decision tree analysis and created surplus and analyzing sales data for industry

Environment: MySQL 5.5, Excel, SAS Enterprise Miner 14.1, SAS Enterprise Guide 6.1, MS Office 2013 (PowerPoint/Word), SPSS 19, R 3

Confidential

Analyst

Responsibilities:

  • Assisted Compliance Manager on the supervision and guidance of the relevant departments to establish anti-money laundering systems and identify suspicious transactions
  • Identified and assessed Chinese shadow banking’s compliance issues that require follow-up or supervision and formulated written policies and procedures related to supervising those activities
  • Collected and analyzed Chinese Shadow banking system data, gathered market and internal information to define policies and strategies
  • Cross-referenced data from both internal and external shadow banking system studies to have a comprehensive overall understanding of the worldwide shadow banking compliance policies
  • Managed complexity in terms of loans, interbank lending and p2p platform lending data, performed analysis like regression using Excel, SPSS and R
  • Reported on the analysis of implementation of Anti-Money laundering(AML) policies in mainland China
  • Researched on policies that required banking agencies to review and enhance training, and develop anti-money laundering examination procedures.

Environment: MS Office 2010 (Excel/PowerPoint/Visio/Word), SPSS 19, R 2

We'd love your feedback!