Credit Risk Analyst (data Scientist) Resume
Jersey City, NJ
SUMMARY
- Over 6 years of profound experience as a Data Scientist/Analyst with profound Data Mining, Machine Learning, and Data Analysis skills.
- Extensive knowledge in Financial, Banking, and Retail domains.
- Strong expertise in Machine Learning Algorithms, including Linear Regression, Logistic Regression, Support Vector Machines, Regularization, Boosting, Decision Tree, Random Forests, Naïve Bayes, K - Means Clustering, Neural Networks, and other machine learning algorithms.
- In-depth knowledge in Hypothesis Testing, A/B Testing, ANOVA, Time Series and other Statistics Methodologies.
- Adept in implementing Python libraries such as Pandas, Numpy, Scipy, statsmodels, BeautifulSoup, word2vec, GloVe, Scikit-learn, Matplotlib, Seaborn, Bokeh, Plotly, geoplotlib, Scikit-learn to perform data cleaning, feature engineering, exploratory analysis, machine learning algorithms and visualization.
- Hands on experience in using R packages such as Dplyr, Tidyr, e1071, text2vec, syuzhet, rpart, kernlab, ggplot2, Shiny App 1.0.3 to analyze data, construct models, produce data products and evaluate results.
- Working knowledge in handling big data by implementing Hadoop 2.x, HDFS Architecture, Spark, Sqoop, and Hive 2.1 to support the processing and storage of extremely large datasets in a distributed computing environment.
- Hands on experience in implementing Spark MLlib library to execute dimensionality reduction, and Machine Learning algorithms on the existing large size data.
- Extensive programming skills in analytical and statistical programming languages such as Python 3.6, R 3.3.
- Experience in Database Creation, maintenance of physical data models with Oracle SQL 12.2, MySQL Database 5.7 and Microsoft SQL Server 13.0.
- Experienced in NoSQL databases, including MongoDB 3.4.9 (pymongo) to store and retrieve unstructured data.
- Hands on experience in visualizations by using R, Python, Tableau 10.x, and Power BI Service.
- Excellent technical skills in SAS system with proficiency in SAS Base, SAS Macros, SAS Graph, and SAS SQL.
- Strong experience in Git 2.10.1 version control to track changes in project files among the whole team.
- Experience in Software Development Life Cycle (SDLC) including planning, design, building, testing, and deployment stages.
- Excellent communication, listening, interpersonal skills, team working, and presentation skills. Extensive experience in multi-tasking, and interacting with business, clients and end-users.
TECHNICAL SKILLS
Machine Learning\Packages - R\: Linear Regression, Logistic Regression, \R - sqldf, forecast, plyr, Dplyr, Tidyr, e1071, \ Multivariate Linear Regression, Stochastic \text2vec, syuzhet, rpart, kernlab, stringr, \Gradient Descent Linear Regression, Linear \RPostgreSQL, RMYSQL, RMongo, RODBC, \Discriminant Analysis, Naïve Bayes, K-Nearest \RSQLite, lubridate, ggplot2, qcc, reshape2, \Neighbor, Learning Vector Quantization, \randomForest, car, RMiner, Caret, BigRF, \ Support Vector Machine, Bagging, Random \XML, Shiny App\Forest, Boosting, AdaBoost, GBM, XG Boost, \ LightGBM, CatBoost, Neural Network, Natural \
Language Processing\Statistics\Packages - Python\: Parameter Estimation, Hypothesis Testing, \Python - Pandas, Numpy, SciPy, Scikit-Learn, \ Bayesian Analysis, Identifying Estimator, \Matplotlib, Seaborn, Bokeh, Plotly, Theano, \ Experiements Design, Bootstrapping, Statistics \TensorFlow, Keras, NLTK, Gensim, Scrapy, \ & Probability Theories, Markov Chains, \Statsmodels, BeautifulSoup, word2vec, GloVe, \ Principal Component Analysis\Geoplotlib\
Hadoop Ecosystem\Databases\: Hadoop 2.x, HDFS Architecture, Spark 1.6+, \Oracle 11g, MySQL 5.7, SQL Server, \ Sqoop, Hive 2.1\MongoDB 3.4\
Languages\Package Management System\: Python 2.7/3, R, PL/SQL, SAS, Hive, Pig, Scala\Homebrew\
Data Visualization\Operating Systems\: Tableau 10.x, Power BI Service, Matplotlib, \Windows 10/8/7/XP/Vista/2003, Mac OS, \ Seaborn, Bokeh, Plotly, D3.js, RShiny\Unix/Linux\
Financial Skills\Integrated Development Environments\: ABS Analysis, Risk Management, Securities \RStudio, Spyder, Jupyter Notebook, Atom, \ Analysis, Fixed Income Analysis, Cash Flow \IntelliJ IDEA, Anaconda\Analysis, VaR, CCAR\
PROFESSIONAL EXPERIENCE
Confidential, Jersey City, NJ
Credit Risk Analyst (Data Scientist)
Responsibilities:
- Extracted, manipulated, cleaned data and conducted corresponding feature engineering. Several machine learning algorithms were implemented to find a criterion for default detection. Got requirements from Risk Detection Department and got data from Marketing Department.
- Scraped small business website data using BeautifulSoup module. Conducted Natural Language Processing on existing commercial orders, comments, reviews, balance sheets and other text-format documents.
- Performed Lemmatization and Stemming using NLTK package. Implemented TF-IDF for information retrieval. Used the pipeline framework in ScikitLearn package for text classification.
- Extracted and manipulated data with MySQL 5.7. Stored Procedures were created for the data preparation process.
- Performed Data Cleaning, Outlier Detection, Feature Scaling, and Feature Engineering using Python 3.6 and Python libraries such as Pandas, Numpy, Scipy, Seaborn, Sklearn, Matplotlib and Imblearn.
- K-Means Clustering was implemented for finding interesting relationship among features and other feature engineering purpose.
- Implemented Principal Component Analysis on dataset for feature extraction and dimension reduction.
- Developed predictive models by using Logistic Regression, Naïve Bayes Classifier, Support Vector Machines, K-Nearest Neighbor, Decision Tree and Random Forest algorithms.
- Conducted Grid Search for Hyperparameter Tuning to find proper models for delinquency and default detection.
- Evaluated models for detecting overfitting by using Cross Validation.
- Used Confusion Matrix, Accuracy Rate and Recall Rate to evaluate the models.
- Conducted visualization by using interactive Plotly and Bokeh libraries. Power BI was used for the creation of the reports.
- Updated project and used Git as version control.
- Communicated with risk management department to understand and identify the related loan default risk exposure data and conducted final report in support of strategic decision making.
Environment: MySQL 5.7, Python 3.6, Pandas 0.20, Numpy, Seaborn, Matplotlib 2.1, Plotly 1.x, Bokeh, Windows
Confidential, New York City, NY
Data Analyst
Responsibilities:
- Used historical data to build statistical models and made prediction of the real estate prices in several economic markets, particularly focused on analyzing the factors that affect the value of properties. Duties were distributed by the direct manager, datasets were obtained from Data Engineering team and online sources.
- Assisted in exporting analyzed data to relational databases (Microsoft SQL Server 13.0) using Sqoop.
- The data was retrieved by using Microsoft SQL Server 13.0. Part of online and email text data were retrieved by using RMongo in R from MongoDB 3.6.
- Conducted Data Preparation, Exploratory Data Analysis, Outliers Detection, Feature Engineering by using tidyr, dplyr, and ggplot2 in R 3.3.
- Tm package was used for text mining and Snowba llC was implemented for text stemming.
- Constructed distributions of different features, and conducted scatter plot, heatmap, bar chart, choropleth map and other visualization tools in ggplot2 to generate insights for model construction.
- K-means Clustering and Hierarchical Clustering were implemented on categorical features to find meaningful features.
- Assisted in loading the cleaned data to Hadoop Distributed File System (HDFS).
- Implemented Spark MLlib to conduct feature transformation, such as standardization, normalization and hashing. Machine Learning Pipeline was also conducted under Spark MLlib.
- Developed predictive algorithms using advanced Data Mining methods such as Linear SVC (Support Vector Machine), K Neighbors Classifier, SVC (Support Vector Machine), and SGD (Stochastic Gradient Descent) to classify similar properties together to develop sub-markets. Each zip code was divided into submarkets by the algorithms mentioned above.
- Linear Regression, Elastic Net, Lasso, and Ridge Regression models were trained by using the training dataset based on different sub-markets.
- The R packages used for constructing the above models including rpart, caret, glmnet, and gbm.
- Created and presented executive dashboards and scorecards to show the trends in the data using Excel and VBA-Macros for visualization purpose.
- Interactive visualization data products using Plotly and Bokeh (rBokeh) were also created.
- Reports and dashboards were generated by Tableau.
Environment: R 3.3, Microsoft SQL Server 13.0, MongoDB 3.6, tidyr 0.4.1, dplyr 0.4, Excel, VBA-Macros, Windows
Confidential
Data Analyst
Responsibilities:
- Involved in full Software Development Life Cycle (SDLC) including planning, design, building, testing, and deployment.
- Assisted database engineer team to construct the database.
- Created database objects such as Tables, Indexes, Sequences, Views and Synonyms.
- Tuned MySQL 5.7 queries to improve performance.
- Assisted developers with complex query tuning and schema refinement.
- Extracted data from the company SQL database and used R 3.2 programing language to explore features and implemented Machine Learning algorithms. Text mining and sentiment analysis tools such as text2vec, syuzhet were used.
- Generated data analysis with data visualizations using ggplot2, and Tableau 9.x, successfully delivered the results to financial products promotion team.
Environment: MySQL 5.7, SPSS, Excel, Windows, R 3.2, Tableau 9.x
Confidential
Financial Marketing Data Analyst
Responsibilities:
- Used MySQL 5.6 as database and involved in part of Query Writing.
- Conducted data cleaning and reshaping, generated subsets by using Pandas and Numpy packages in Python 3.6.
- Performed SAS programming to provide complex data review reports to support different groups. SAS Macros, applications, and other tools were designed and developed.
- Developed Machine Learning algorithms, such as Linear Regression, Logistic Regression and Decision Tree to build predictive models using SPSS.
- Helped to Tune parameters of the models and conducted Model Evaluation.
- Assisted in the daily maintenance of the database that involved monitoring the daily run of scripts.
- Developed and maintained the programs including testing and organizing the SAS datasets, SAS programs and related documentation.
- Involved in preparation for reports and presented the result to financial sales team for developing and adjusting financial products sales strategies.
Environment: MySQL 5.6, SPSS, Python 3.6, Pandas, Numpy, Windows, SAS
Confidential
Business Analyst
Responsibilities:
- Translated user requirements into business solutions. Collected data and interpreted requirements.
- Wrote SQL queries using joins and sub queries to retrieve data from database.
- Modeled business problems in support of decision-making. The models included Linear Regression Models, Logistic Regression Models as well as Time Series Models. Models were constructed through R 2.13.
- Developed innovative and effective approaches to solve analytical problems such as predicting sales performance based on different sectors of commodities of company and communicated results and methodologies by implementing Excel Pivots Table.
- Involved in peer to peer code reviews.
- Communicated with inventory management team to better evaluate the solutions of sales forecasting.
Environment: Excel, Excel Pivots Table, Windows, R 2.13, SQL