Data Scientist Resume New York City, NY - Hire IT People

SUMMARY

Over 6+ years of experience in Machine Learning, Data mining, Predictive modeling and Visualization with large data sets of Structured and Unstructured data in IT and Banking Domain.
Adept and deep understanding of Python3.3 with Numpy, Pandas, Scipy, Scikit - learn, matplotlib and NLTK.
Proficient knowledge on SQL and NOSQL databases like MySQL 5.x, MongoDB 3.x, Cassandra3.x and HBase 1.2.x.
Experience in Big Data technologies like Hadoop Eco-system, Spark 2.x and MapR Streaming.
Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Lasso/Ridge Regression. Testing and validation using ROC plot, K-fold cross validation.
Worked with machine learning algorithms such as Adaboost, GBDT, XGBoost, Gaussian mixture model, Structural equation model and Kalman filter.
Strong skills in Statistics methodologies such as Hypothesis Testing, Correspondence Analysis, Principle Component Analysis, ARIMA, GARCH time series analysis and A/B testing.
Proficient in building, publishing customized interactive Reports and Dashboards by using Tableau9.4, D3.js.
Good knowledge on Recommender Systems, Natural Language Processing and Data visualization.
Skills in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, remap, merge, subset, re-index, melt and reshape.
Working experience in Cloud Computing technologies such AWS EC2, Google Cloud Computing.
Hands on Large Parallel and Integrated GPU Computation Platform by using PyCuda 1.2, OpenCL R3.
Experience in AGILE methodologies, SCRUM process and GIT for Version Control.
Expertise in handling multiple tasks with an aggressive approach to meet deadlines and create deliverables in fast-paced environments; comfortable in interacting with business and end users.

TECHNICAL SKILLS

Programming Languages: Python2.x/3.x (numpy, pandas, nltk, scikit-learn, matplotlib), SQL, JavaScript, R 3.x, SAS 9.x

Statistical Methods: Time Series ANOVA Bayes Law PCA A/B test

Regression: Linear/Non-Linear, Logistic, SVM, Regression tree

Classification: KNN, Naive Bayes, SVM, decision tree, random forest, Boosting

Clustering: K-means, Hierarchical clustering

Others: Collaborative Filtering, Neural Network, NLP, Deep Learning

Database Technique: MySQL 5.x SQL-Server 2010+ MongoDB 3.x Cassandra 3.x HBase 0.98

Big Data Technique: Hadoop 2.x Spark 2.x HDFS 2.x Hive 2.x Hbase 1.x MapR-Streaming

Cloud Platforms/GPU: AWS Google Cloud PyCuda 1.2 OpenCL R3

Data Visualization: Tableau 9.4/9.2, D3.js, Python-Matplotlib

Operation System: Mac OS Windows Linux(Ubuntu)

IDE’s: PyCharm2017 Spyder 2.1 JupyterNotebook 4.1 Sublime ‎2.0

Other Skills: XML2.x CSS HTML 5.2 AngularJS 1.x Django 1.11

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

Data Scientist

Responsibilities:

Deployed Adaboost, GBDT, XGboost and other machine algorithms to analysis millions customer’s behavior.
Parsed data, producing concise conclusions from raw data in a clean, Well-structured and Easily Maintainable format.
Used Pandas, Numpy, PyCuda, OpenCL, Scikit-learn, in Python for developing upon Confidential ’s Parallel and Integrated GPU Computation Platform.
Worked on driver’s profiles and historical data to improve both drivers and users experience and developed data-driven approaches to understand user profiles
Performed Linear(Nonlinear) and Logistic Regression (SVM, Random Forest) to tag/classify users.
Performed K-means clustering and Multivariate analysis in Python and developed Clustering algorithms and KNN that improved Customer segmentation and Market Expansion.
Worked on Regional Fragmentation Analysis based upon Geo-location to optimize driver’s distribution on map.
Transferred hexagon regional fragmentation analysis to irregular regional fragmentation analysis.
Reduced long-term prediction error of keyUbermetrics from 35% to 10%, conducted experimentation and optimization on lifetime valuation ofUberusers.
Designed and manage A/B experimentation and derive business insights from post-hoc analysis.
Built high performance MySQL/Hive/MongoDB queries and intuitive dashboards for management, engineering and internal collaborators
Provideddatascience support todata-driven decision making in product development cycles.

Environment: Python3.3, Scikit-learn, PyCuda 1.2, OpenCL 2.1, MySQL5.7, HDFS 2.7, Hive 2.1, Spark 2.1, MongoDB 3.4.

Confidential, New York City, NY

Data Scientist

Responsibilities:

Designed and implemented data-driven debit/creditcardfraudrisk model withPython and developedfraudrisk rules/strategiesby SQL Server2016 and achieved Account Takeover Scenario loss reduction by 10% ($3.4MM) per year.
Obtained and transformed principal components features with PCA in the highly-unbalanced dataset and measuring the accuracy using the AUPRC.
Real-time Fraud Prediction Using Spark Streaming and batch processing, modularized Spark functions written for the offline machine learning can be re-used for the real-time machine learning.
Used MapR-Streams, MapR-DB(HBase API) and MapR-FS
Performed Market-Basket Analysis and implemented Decision Trees, Random Forests and K- fold cross validation.
DevisedCreditCardFraudClassification system using SVM inPython, TACL and relational database on HP Non-Stop systems to identify risk of payment transactions and classifying normal versus fraudulent transactions improving F-score of the existing system from 0.65 to 0.94.
Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development
Effectively preventedfraudactivities of large compromise events by ad-hoc analysis of event data and cooperation with Vendors usingefficient SQL/Pythonprogram in a timely manner.
Models and probability distributions of various business activities either in terms of various parameters or probability distributions, time-series analysis of time-dependent data.
Designed rich data visualizations to model data into human-readable from ROC curve, heat map, D3 visualization, Tableau, etc.
Performed ARIMA and GARCH time series analysis and Gaussian mixture model.

Environment: Python3.2, Hadoop2, Spark1.6, Spark-Streaming, Hbase, HDFS, Hive, Cassandra3.9, D3.js, Matpoltlib, Tableau9.4, SQL Server2016.

Confidential

Data Scientist

Responsibilities:

Performed Logistic Regression, Classification, Random Forests and Clustering in Python.
Developed the first hybrid recommender containing both content-based and collaborative filter algorithms.
Web-scraped over 310,000 reviews and over 19,000 users' ratings using Python, including Request, Beautifulsoup, lxml, CSS/Xpath selector and Anti-Scrapy technology.
Built the text processing pipeline containing Tokenization, Lemmatization, TF-IDF, sentiment analysis, Latent Semantic Analysis and Singular Value Decomposition.
Hands on and designed Anti-bots and Target-spam system.
Utilized MySQL/MongoDB to store user preference and information and deployed the application to Confidential -Cloud Computing for better performance.
Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions)

Environment: Python2.7, Html5, Css3, JavaScript, Scikit-learn, MongoDB, Cloud Computing

Confidential

Data Analyst

Responsibilities:

Acquiringdatafrom Taobao (Chinese Ebay)reviewsusing python web crawler and SentimentAnalysis.
Performed text analysis using signals systems to find patterns in customer behaviors along with Weibo (Chinese Twitter)analytics.
Developed the requiredXMLSchema documents and implemented the framework forparsingXML documents.
Sentimentanalysis model to classify and predictreviewsusing NLTK
Created Dashboards(Tableau/PPT) for stakeholders to monitor KPIs
Analyzed trends and rankings by linear and multivariable regression to make more effective prediction and product development decisions.

Environment: Python2.7, NLTK, Tableau 9.1, PowerPoint, MySQL 4.1

Confidential

Business Analyst

Responsibilities:

Collecting, understanding, and transmitting the business requirements for the project, and translating these into functional specifications along with customization of telecom software products.
Gatheredbusinessrequirements through interviews, surveys and observing from account managers and conducted controlled brain-storming sessions with project focus groups and documented them in theBusinessrequirement document.
Created Use Case Diagrams, Activity Diagrams, and Sequence Diagrams using MS Visio/Excel.
Coordinated with QA team to create the test approach and determine test needs, test environment, test data, resources and limitations.
Assisted the QA in performing simple SQL queries for QA testing and data validation.

Environment: MS Visio, MS Office(Excel/PowerPoint/Word), SQL-Server 2010

We provide IT Staff Augmentation Services!

Data Scientist Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship