We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

New York City, NY

SUMMARY

  • Over 6+ years of experience in Machine Learning, Data mining, Predictive modeling and Visualization with large data sets of Structured and Unstructured data in IT and Banking Domain.
  • Adept and deep understanding of Python3.3 with Numpy, Pandas, Scipy, Scikit - learn, matplotlib and NLTK.
  • Proficient knowledge on SQL and NOSQL databases like MySQL 5.x, MongoDB 3.x, Cassandra3.x and HBase 1.2.x.
  • Experience in Big Data technologies like Hadoop Eco-system, Spark 2.x and MapR Streaming.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Lasso/Ridge Regression. Testing and validation using ROC plot, K-fold cross validation.
  • Worked with machine learning algorithms such as Adaboost, GBDT, XGBoost, Gaussian mixture model, Structural equation model and Kalman filter.
  • Strong skills in Statistics methodologies such as Hypothesis Testing, Correspondence Analysis, Principle Component Analysis, ARIMA, GARCH time series analysis and A/B testing.
  • Proficient in building, publishing customized interactive Reports and Dashboards by using Tableau9.4, D3.js.
  • Good knowledge on Recommender Systems, Natural Language Processing and Data visualization.
  • Skills in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, remap, merge, subset, re-index, melt and reshape.
  • Working experience in Cloud Computing technologies such AWS EC2, Google Cloud Computing.
  • Hands on Large Parallel and Integrated GPU Computation Platform by using PyCuda 1.2, OpenCL R3.
  • Experience in AGILE methodologies, SCRUM process and GIT for Version Control.
  • Expertise in handling multiple tasks with an aggressive approach to meet deadlines and create deliverables in fast-paced environments; comfortable in interacting with business and end users.

TECHNICAL SKILLS

Programming Languages: Python2.x/3.x (numpy, pandas, nltk, scikit-learn, matplotlib), SQL, JavaScript, R 3.x, SAS 9.x

Statistical Methods: Time Series ANOVA Bayes Law PCA A/B test

Regression: Linear/Non-Linear, Logistic, SVM, Regression tree

Classification: KNN, Naive Bayes, SVM, decision tree, random forest, Boosting

Clustering: K-means, Hierarchical clustering

Others: Collaborative Filtering, Neural Network, NLP, Deep Learning

Database Technique: MySQL 5.x SQL-Server 2010+ MongoDB 3.x Cassandra 3.x HBase 0.98

Big Data Technique: Hadoop 2.x Spark 2.x HDFS 2.x Hive 2.x Hbase 1.x MapR-Streaming

Cloud Platforms/GPU: AWS Google Cloud PyCuda 1.2 OpenCL R3

Data Visualization: Tableau 9.4/9.2, D3.js, Python-Matplotlib

Operation System: Mac OS Windows Linux(Ubuntu)

IDE’s: PyCharm2017 Spyder 2.1 JupyterNotebook 4.1 Sublime ‎2.0

Other Skills: XML2.x CSS HTML 5.2 AngularJS 1.x Django 1.11

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

Data Scientist

Responsibilities:

  • Deployed Adaboost, GBDT, XGboost and other machine algorithms to analysis millions customer’s behavior.
  • Parsed data, producing concise conclusions from raw data in a clean, Well-structured and Easily Maintainable format.
  • Used Pandas, Numpy, PyCuda, OpenCL, Scikit-learn, in Python for developing upon Confidential ’s Parallel and Integrated GPU Computation Platform.
  • Worked on driver’s profiles and historical data to improve both drivers and users experience and developed data-driven approaches to understand user profiles
  • Performed Linear(Nonlinear) and Logistic Regression (SVM, Random Forest) to tag/classify users.
  • Performed K-means clustering and Multivariate analysis in Python and developed Clustering algorithms and KNN that improved Customer segmentation and Market Expansion.
  • Worked on Regional Fragmentation Analysis based upon Geo-location to optimize driver’s distribution on map.
  • Transferred hexagon regional fragmentation analysis to irregular regional fragmentation analysis.
  • Reduced long-term prediction error of keyUbermetrics from 35% to 10%, conducted experimentation and optimization on lifetime valuation ofUberusers.
  • Designed and manage A/B experimentation and derive business insights from post-hoc analysis.
  • Built high performance MySQL/Hive/MongoDB queries and intuitive dashboards for management, engineering and internal collaborators
  • Provideddatascience support todata-driven decision making in product development cycles.

Environment: Python3.3, Scikit-learn, PyCuda 1.2, OpenCL 2.1, MySQL5.7, HDFS 2.7, Hive 2.1, Spark 2.1, MongoDB 3.4.

Confidential, New York City, NY

Data Scientist

Responsibilities:

  • Designed and implemented data-driven debit/creditcardfraudrisk model withPython and developedfraudrisk rules/strategiesby SQL Server2016 and achieved Account Takeover Scenario loss reduction by 10% ($3.4MM) per year.
  • Obtained and transformed principal components features with PCA in the highly-unbalanced dataset and measuring the accuracy using the AUPRC.
  • Real-time Fraud Prediction Using Spark Streaming and batch processing, modularized Spark functions written for the offline machine learning can be re-used for the real-time machine learning.
  • Used MapR-Streams, MapR-DB(HBase API) and MapR-FS
  • Performed Market-Basket Analysis and implemented Decision Trees, Random Forests and K- fold cross validation.
  • DevisedCreditCardFraudClassification system using SVM inPython, TACL and relational database on HP Non-Stop systems to identify risk of payment transactions and classifying normal versus fraudulent transactions improving F-score of the existing system from 0.65 to 0.94.
  • Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development
  • Effectively preventedfraudactivities of large compromise events by ad-hoc analysis of event data and cooperation with Vendors usingefficient SQL/Pythonprogram in a timely manner.
  • Models and probability distributions of various business activities either in terms of various parameters or probability distributions, time-series analysis of time-dependent data.
  • Designed rich data visualizations to model data into human-readable from ROC curve, heat map, D3 visualization, Tableau, etc.
  • Performed ARIMA and GARCH time series analysis and Gaussian mixture model.

Environment: Python3.2, Hadoop2, Spark1.6, Spark-Streaming, Hbase, HDFS, Hive, Cassandra3.9, D3.js, Matpoltlib, Tableau9.4, SQL Server2016.

Confidential

Data Scientist

Responsibilities:

  • Performed Logistic Regression, Classification, Random Forests and Clustering in Python.
  • Developed the first hybrid recommender containing both content-based and collaborative filter algorithms.
  • Web-scraped over 310,000 reviews and over 19,000 users' ratings using Python, including Request, Beautifulsoup, lxml, CSS/Xpath selector and Anti-Scrapy technology.
  • Built the text processing pipeline containing Tokenization, Lemmatization, TF-IDF, sentiment analysis, Latent Semantic Analysis and Singular Value Decomposition.
  • Hands on and designed Anti-bots and Target-spam system.
  • Utilized MySQL/MongoDB to store user preference and information and deployed the application to Confidential -Cloud Computing for better performance.
  • Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions)

Environment: Python2.7, Html5, Css3, JavaScript, Scikit-learn, MongoDB, Cloud Computing

Confidential

Data Analyst

Responsibilities:

  • Acquiringdatafrom Taobao (Chinese Ebay)reviewsusing python web crawler and SentimentAnalysis.
  • Performed text analysis using signals systems to find patterns in customer behaviors along with Weibo (Chinese Twitter)analytics.
  • Developed the requiredXMLSchema documents and implemented the framework forparsingXML documents.
  • Sentimentanalysis model to classify and predictreviewsusing NLTK
  • Created Dashboards(Tableau/PPT) for stakeholders to monitor KPIs
  • Analyzed trends and rankings by linear and multivariable regression to make more effective prediction and product development decisions.

Environment: Python2.7, NLTK, Tableau 9.1, PowerPoint, MySQL 4.1

Confidential

Business Analyst

Responsibilities:

  • Collecting, understanding, and transmitting the business requirements for the project, and translating these into functional specifications along with customization of telecom software products.
  • Gatheredbusinessrequirements through interviews, surveys and observing from account managers and conducted controlled brain-storming sessions with project focus groups and documented them in theBusinessrequirement document.
  • Created Use Case Diagrams, Activity Diagrams, and Sequence Diagrams using MS Visio/Excel.
  • Coordinated with QA team to create the test approach and determine test needs, test environment, test data, resources and limitations.
  • Assisted the QA in performing simple SQL queries for QA testing and data validation.

Environment: MS Visio, MS Office(Excel/PowerPoint/Word), SQL-Server 2010

We'd love your feedback!