Financial Data Analyst Resume
San Francisco, CA
SUMMARY:
- 4+ years experience with a solid background in Mathematics, Statistics, and Finance, dedicated on manipulating large datasets, statistical modeling, financial modeling and business intelligence
- 3+ years experience in designing and configuration of relational databases using MySQL, SQL Server, MS SQL
- Query optimization and performance tuning of queries for better performance in SQL
- Built MySQL database, and developed stored procedures, tables, views, etc for processing data
- Utilized ETL to perform data extraction, manipulation and procession and load the data into database with Python Pandas, Numpy, and MySQL
- Familiar with NoSQL database e.g. DynamoDB and redshift , created DynamoDB tables and used the table to store and retrieve data using Python and Boto 3
- 5+ years experience with Python, working knowledge with python data wrangling libraries such as Numpy, SciPy, Pandas, scikit - learn, and data visualization libraries such as Matplotlib, Seaborn
- Identified patterns, performed exploratory and descriptive analysis, explored feature distribution after model assumption validation using Python pandas, seaborn, matplotlib
- Visualized data, explored features and conducted t-test to validate the assumptions using Python
- Performed advanced statistical analysis in Python/R (Time Series model, univariate and multivariate analysis of variance, PCA, survival analysis, regression modeling), presented qualitative data summary tables and figures
- Designed supervised and unsupervised machine learning and implemented with python , including linear regression with LASSO/ Ridge regularization, logistic regression, decision trees, random forest, gradient boost, KNN, SVM, etc.
- Developed Tableau visualizations and dashboards with multiple panels and parameters using Tableau desktop
- Solid skills knowledge in statistical methods, including regressions, predictive models, cross validation, hypothesis test, and probability
- Processed big data with Apache Hadoop (HDFS, Yarn, MapReduce), Pig, Hive, and Spark
- Defined, executed and interpreted simple to complex Hive SQL queries
- Manipulated financial data using Hive to generate stock dividends
- Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance
- Built and maintained cloud infrastructure that could store, process, and manage the data
- Infrastructure development on AWS by employing services such as EC2, S3, lambda, CloudWatch, DynamoDB
- Familiar with operating systems such as Mac OS, Window, Linux
- Working knowledge in HTML, Flask web application
- Solid experience in business-level data analysis in Excel (Pivot Tables, Vlookup, Visual Basics)
- Experience in version control system such as git and github
- Professional experience in business intelligence, conducted intensive research on search engine optimization, user growth strategies, marketing strategies, etc
- Excellent presentation skills with ability to explain data insights to non-experts, good collaborative skills to communicate with cross-functional team
TECHNICAL SKILLS:
Programming Language: SQL, Python, R, SAS, C++, Matlab
Packages: Numpy, SciPy, Pandas, scikit-learn, Matplotlib, Seaborn, qqplot, dplyr, R markdown
Analytics Tool: Excel (Pivot Tables, Vlookup, VBA)
Visualization Tools: Tableau, Microsoft PowerPoint, Excel
Cloud: AWS (EC2, S3, Lambda, IAM, RDS), SaaS
Database: MySQL, NoSQL (DynamoDB)
Bigdata tools: Apache Hadoop, HDFS, Spark, Hive, Pig, Zookeeper
Project Management Tools: Jira
Operation System: Linux, Mac OS, Windows
Methodology: AB/AA Test, Statistical Machine Learning, Time Series Modeling, Stochastic Modeling
Certifications: CFA Level II Candidate; Actuarial Exam P/1 and FM/2
Language: English (Proficient), Mandarin (Native), Spanish (Basic), Japanese (Basic)
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Financial Data Analyst
- Used AWS EC2 console to launch cloud instance, chose Amazon Machine Image ( AMI) to launch virtual machine, and configured the instance
- Setup a python environment, transferred python scripts from local machine to the AWS remote instance using scp (Secure Copy Protocol)
- Used command-line tool through Terminal to connect to AWS S3 bucket
- Collected historical NYSE stock ticker data, used Hadoop, Hive and Pig, Spark to manage data processing and storage
- Created data directories on HDFS (Hadoop Data File System) to store data
- Queried data using data warehouse infrastructure tool such as Hive to process structured data in Apache Hadoop
- Transformed data by sorting, grouping, filtering, using SQL queries in Hive, generated stock dividends
- Experience working with NoSQL, created a DynamoDB table and use the table to store and retrieve data using Python and Boto 3
- Collected multiple 10-year stock market index data (S&P 500 Index, CSI 300 Index, IC 500 Index, etc.) with Bloomberg
- Migrated multiple historical stock data into SQL database
- Pre-processed data for trading strategy evaluation using MySQL, consolidated data from separate tables into one table with a distinct symbol indicator for each stock, for easier evaluation of trading strategies
- Analyzed periodic trends and computed 5-day and 30-day moving average from the base stock data, smoothed out price action by filtering out the noise from short-term price fluctuations using SQL window functions
- Visualized market index data using Tableau and Python Matplotlib
- Created Tableau dashboard and reports to show periodic trends, candlestick charts, index price changes, index sector weightings, market caps, etc.
- Backtested rebalancing strategies that alternate long-short positions of two stock indexes based on relative cumulative rate of return utilizing Python
- Analyzed the metrics, including returns, Sharpe ratio, volatility, maximum drawdown, etc to test the performance of the strategy with Python, achieving an annualized return of 17%
- Optimized the rebalancing strategy by tuning parameters of buy/sell signals, tested the sensitivity of returns to parameters, increasing returns by 0.8%
- Collected 10-year Exchange-traded fund (ETF) data with Bloomberg, analyzed and priced ETF options using Black-Scholes Models in Python
- Used Greeks (Delta, Gamma, Theta, Beta, etc) to analyze market volatility, risks and arbitrage
- Conducted in-depth research on implied volatilities of various futures markets and explored the statistical properties of implied volatilities
- Researched the ETF, plotted the 10-year daily chart, analyzed 5-day/30-day percentage price variation, summarized variations and visualized the results in Python
- Implemented Black-Scholes Models (Python) to derive implied volatility and price call options, compared with real prices to evaluate the pricing model
Data Analyst/Business Analyst
- Supported eBusiness team to enable easy access to data via web scraping, data mining, and helped design content-based recommendation (to predict recommendation for the product)
- Collaborated with external partners to collect product data using Python
- Utilized ETL to perform data extraction to produce cosmetics sales attributes data using SQL queries, including groupby queries, joins, subqueries, etc in MySQL
- Built a simple recommendation system using Python in Jupiter Notebook based on previous cosmetics attributes
- Conducted structure thinking on the problem and generated assumptions
- Performed Exploratory Data Analysis (EDA), feature pivoting, and visualization on attributes data to identify trends and validate assumptions using Python Seaborn and Matplotlib
- Performed univariate and multivariate analysis to test the previous assumptions using Python
- Pre-processed raw data using ETL tools Python Pandas, and performed data cleaning, including missing data treatment, redundant values, inconsistent information and outliers removement
- Transformed categorical values to numerical values for easier model application using Python Pandas with aggregate functions and lambda functions
- Visualized cleaned sales data using Python Matplotlib and Tableau
- Developed Tableau visualizations and dashboards with multiple panels and parameters using Tableau desktop to show histograms, scatterplots, boxplots, correlation tables, etc.
- Updated MySQL database with cleaned brands data, developed stored procedures, tables, views, etc for easier processing of data
- Explored feature distribution, performed feature selection and feature engineering on cleaned data, created dummy variables for easier model implementation in Python
- Chose relevant classification methods, performed hyper parameter tuning to determine the optimal values for models, including Logistic Regression, Decision Tree, Random Forest
- Implemented multiple classification methods to predict the recommendation level of beauty products using Python, obtaining an accuracy of 72.7%
- Calculated metric scores for models (precision, recall, F1 score, etc) to see the performance using Python scikit-learn
- Engaged A/B testing to optimize the recommendation system, involving measures such as click-through rate (CTR) and conversion rate (CR)
- Analyzed results from the A/B test, generated assumptions, and conducted t-test to validate the assumptions
- Explored reasons behind the results, and proposed further improvements
- Created EC2 instance to deploy the images to AWS Cloud
Confidential, Santa Barbara, CA
Health Data Analyst
- Collected Medicare 5% file (5% of national Medicare records, about 2.8 Million records) from Centers for Medicare and Medicaid Services (CMS)
- Extracted sample dataset from Medicare 5% file, pertained 100,000 individual records of year 2012 and 2013 for benchmark analysis
- Reorganized data, turned unique identification keys from claim ID to individual patients using SQL
- Analyzed demographic and geographic data of patients, removed duplicated/redundant records, transformed variables according to algorithm needs using R
- Mapped ICD-9 code (International Classification of Disease) to CC code, and then to HCC code (Hierarchical condition category ) to obtain risk coefficients in R
- Applied Medicare risk adjustment model, calculated individual risk scores by applying sum of weights that reflects the health risk posed by different diagnoses
- Implemented Medicare Shared Saving Algorithm and calculated savings to ACO in Excel and R
- Simulated performance year data (2014) by multiplying a trend factor to existing data and generating necessary components in Excel and R
- Calculated number of assigned beneficiaries, per capita expenditure, and average risk score of beneficiaries in each benchmark year and performance year using R
- Calculated Risk Ratio between them to normalize the most recent benchmark year risk score to 1
- Adjusted benchmark expenditure using national expenditure trend factor and risk ratio, calculated weighted average benchmark expenditure
- Generated total benchmark expenditure in performance year with performance year risk ratio and growth increment
- Compared actual total expenditure and total benchmark expenditure in performance year, concluded 9.9% of savings rate to ACO
- Tuned parameters to optimize the results and test the sensitivity of savings in Excel
Confidential, Los Angeles, CA
Analyst Intern
- Estimated annualized volatility for Facebook stock, computed prices of puts and compared with real prices
- Constructed a Delta-Gamma neutral portfolio to hedge risks using Black-Scholes methodology in Excel, and estimated value of the portfolio corresponding to various stock prices
- Utilized Monte Carlo Simulation to approximate price and Delta
- Developed a Value-at-Risk (VaR) risk calculation system for portfolio of stocks and options under multiple window lengths in Python
- Calculated VaR & ES for the user-specific portfolio with Monte Carlo, Parametric and Historical methods in Python
- Trained the model using historical stock prices in Python, tested the results with actual VaRs to determine the limitations and future improvements of the model