Data Scientist Resume
Minneapolis, MinnesotA
PROFESSIONAL SUMMARY:
- Seasoned data scientist/analyst/modeler with extensive knowledge using machine learning techniques. Recent work has placed an emphasis on deep learning and anomaly detection. Possesses strong programming skills to implement supervised, semi - supervised, and unsupervised machine learning techniques. Skilled in data preprocessing, modeling, and model validation and testing. Able to effectively communicate findings to management.
- Extensive knowledge of machine learning techniques, with an emphasis on deep learning and anomaly detection.
- Experience working as a Data Scientist/Data Analyst/Data Modeling with emphasis on Data Mapping, Data Validation in Data Warehousing Environment.
- Extensively experienced in business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting, and querying tools, Data mining and Spreadsheets.
- Worked on a different type of Python modules such as tensorflow, Theano, pytorch, scipy, numpy, matplotlib, seaborn
- Efficient in developing Logical and Physical Data model and organizing data as per the business requirements.
- Strong understanding of when standard statistical models or deep learning.
- Experienced in employing R Programming, MATLAB, SAS, Tableau, and SQL for data cleaning, data visualization, risk analysis, and predictive analytics.
- Experience in Univariate, Multivariate Analysis, model testing, problem analysis, model comparison and validating model, ANOVA, Regression Analysis.
- Expertise in writing complex SQL queries to obtain filtered data for analysis purpose
- Working knowledge in implementing tree-based models such as Boosting, Bagging, and Random Forests using XGBoost.
- Experience in using Model Pipelines to automate the tasks and put models into production quickly
- Worked with various python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Scikit-learn for machine learning or Deep learning and NLTK for NLP
- Hands-on experience with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis, and Data Visualization Tools • Strong programming skills in a variety of languages such as Python and SQL.
- Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
- Excellent knowledge on creating reports on R, Tableau and PowerBI
- Experienced in large databases such as Amazon Redshift, Google BigQuery, MongoDB, Cassandra, DynamoDB, Amazon S3, Postgres SQL
TECHNICAL SKILLS:
Programming: R, Python (NumPy, Pandas, Scikit-Learn), SQL, HiveQL, Spark, C++
Analytics and Visualization Tools: Tableau, Cognos, Ggplot (R), SAS, PowerBI, Matplotlib
Statistical Methods: ARIMA, ANOVA, Regression Analysis, Hypothesis Testing, Survival Models, Markov chains, Monte Carlo Simulations
Machine Learning: TensorFlow, PCA, RNN, Regression, Clustering, Random Forest, Na ve Bayes, Support Vector Machines
Other Tools: Git Version Control, Jupyter Notebook, IPython Notebook, R Markdown, Unix
Machine Learning Algorithms: Logistic Regression, Linear Regression, Decision Tree, Random Forests, Gradient Boosting, SMOTE, TOMEK, SMOTE ENN, Lasso and Ridge Regression, Nearest Neighbor Classifier, K-means clustering, DBSCAN, Affinity Propagation, Principal Component Analysis, Support Vector Machines, Na ve Bayes, Auto Regression & Moving Averages.
Big Data: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, StormStatistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential, Minneapolis, Minnesota
Responsibilities:
- Test existing algorithms to ensure and maintain a high level of performance using quantitative model validation techniques
- Develop and test hypotheses for engineering improved features to be implemented using visualizations created with matplotlib and Seaborn in Python
- Test for ideal tuning of model hyperparameters using Python
- Perform error analyses using Python to detect any trends or patterns in fraudulent transactions not identified by the models
- Develop new machine learning approaches to continuously improve fraud detection capabilities
- Create proofs of concept for new machine learning approaches using Python and TensorFlow
- Participate in peer review process to ensure correctness, accuracy, and quality of work produced by the team
- Perform unit testing of peer code as part of the peer review process
- In addition to typical anomaly detection approaches, such as supervised classification, and unsupervised clustering, incorporated Recurrent Neural
Game Integrity Data Scientist
Confidential, San Mateo, California
Responsibilities:
- Developed, trained, and tested discriminant agents from Generative Adversarial Networks to flag fraudulent activity.
- Developed proof of concept versions of models in R and Python
- Performed model diagnostics to determine best course of action for improving model performance and model selection
- Productionized models using Scala and Spark
- Implemented Spark streaming to send data into live models in real time
- Used both supervised and unsupervised anomaly detection methods to identify fraudulent activities, such as money laundering
- Machine learning methods included supervised classification methods: Logistic
- Unsupervised methods included various clustering methodologies, such as DBSCAN and K-Means (DBSCAN was the primary unsupervised algorithm, due to its ability to identify data points which don’t belong to any cluster, making it ideal for anomaly detection.)
- Utilized ensemble learning meta-algorithms, specifically stacking, to combine multiple algorithms and approaches into one powerful model with very high accuracy for identifying anomalies
Data Scientist
Confidential, San Francisco, California
Responsibilities:
- Implemented machine learning models in Python to monitor and analyze internal forex systems (both back office and front office)
- Performed manual exploration and analysis of forex systems in Python to detect anomalies, and to continuously form and test hypotheses about possible new features to improve the performance of the implemented machine learning models
- Utilized clustering and segmentation algorithms to find patterns in client and customer bases to identify typical profiles of traders, customers, and clients
- Performed one-off analyses to assist with investigations when potentially fraudulent activities had been identified, or when customers had reported suspicious account activity
- Continuously tested model performance, and implemented changes proven to improve model performance
Data Analyst / Business Representative
Confidential, Wilmington, Delaware
Responsibilities:
- Represented the company during international chemical industry exhibitions
- Communicated with international business partners to determine data and information needs to ensure effective and useful reporting
- Performed statistical analyses and created publication quality data visualizations for reports using R and RStudio
- Used SQL to access data for analysis and reporting