We provide IT Staff Augmentation Services!

Financial Data Analyst Resume

3.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY:

  • 4+ years experience with a solid background in Mathematics, Statistics, and Finance, dedicated on manipulating large datasets, statistical modeling, financial modeling and business intelligence
  • 3+ years experience in designing and configuration of relational databases using MySQL, SQL Server, MS SQL
  • Query optimization and performance tuning of queries for better performance in SQL
  • Built MySQL database, and developed stored procedures, tables, views, etc for processing data
  • Utilized ETL to perform data extraction, manipulation and procession and load the data into database with Python Pandas, Numpy, and MySQL
  • Familiar with NoSQL database e.g. DynamoDB and redshift , created DynamoDB tables and used the table to store and retrieve data using Python and Boto 3
  • 5+ years experience with Python, working knowledge with python data wrangling libraries such as Numpy, SciPy, Pandas, scikit - learn, and data visualization libraries such as Matplotlib, Seaborn
  • Identified patterns, performed exploratory and descriptive analysis, explored feature distribution after model assumption validation using Python pandas, seaborn, matplotlib
  • Visualized data, explored features and conducted t-test to validate the assumptions using Python
  • Performed advanced statistical analysis in Python/R (Time Series model, univariate and multivariate analysis of variance, PCA, survival analysis, regression modeling), presented qualitative data summary tables and figures
  • Designed supervised and unsupervised machine learning and implemented with python , including linear regression with LASSO/ Ridge regularization, logistic regression, decision trees, random forest, gradient boost, KNN, SVM, etc.
  • Developed Tableau visualizations and dashboards with multiple panels and parameters using Tableau desktop
  • Solid skills knowledge in statistical methods, including regressions, predictive models, cross validation, hypothesis test, and probability
  • Processed big data with Apache Hadoop (HDFS, Yarn, MapReduce), Pig, Hive, and Spark
  • Defined, executed and interpreted simple to complex Hive SQL queries
  • Manipulated financial data using Hive to generate stock dividends
  • Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance
  • Built and maintained cloud infrastructure that could store, process, and manage the data
  • Infrastructure development on AWS by employing services such as EC2, S3, lambda, CloudWatch, DynamoDB
  • Familiar with operating systems such as Mac OS, Window, Linux
  • Working knowledge in HTML, Flask web application
  • Solid experience in business-level data analysis in Excel (Pivot Tables, Vlookup, Visual Basics)
  • Experience in version control system such as git and github
  • Professional experience in business intelligence, conducted intensive research on search engine optimization, user growth strategies, marketing strategies, etc
  • Excellent presentation skills with ability to explain data insights to non-experts, good collaborative skills to communicate with cross-functional team

TECHNICAL SKILLS:

Programming Language: SQL, Python, R, SAS, C++, Matlab

Packages: Numpy, SciPy, Pandas, scikit-learn, Matplotlib, Seaborn, qqplot, dplyr, R markdown

Analytics Tool: Excel (Pivot Tables, Vlookup, VBA)

Visualization Tools: Tableau, Microsoft PowerPoint, Excel

Cloud: AWS (EC2, S3, Lambda, IAM, RDS), SaaS

Database: MySQL, NoSQL (DynamoDB)

Bigdata tools: Apache Hadoop, HDFS, Spark, Hive, Pig, Zookeeper

Project Management Tools: Jira

Operation System: Linux, Mac OS, Windows

Methodology: AB/AA Test, Statistical Machine Learning, Time Series Modeling, Stochastic Modeling

Certifications: CFA Level II Candidate; Actuarial Exam P/1 and FM/2

Language: English (Proficient), Mandarin (Native), Spanish (Basic), Japanese (Basic)

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Financial Data Analyst

  • Used AWS EC2 console to launch cloud instance, chose Amazon Machine Image ( AMI) to launch virtual machine, and configured the instance
  • Setup a python environment, transferred python scripts from local machine to the AWS remote instance using scp (Secure Copy Protocol)
  • Used command-line tool through Terminal to connect to AWS S3 bucket
  • Collected historical NYSE stock ticker data, used Hadoop, Hive and Pig, Spark to manage data processing and storage
  • Created data directories on HDFS (Hadoop Data File System) to store data
  • Queried data using data warehouse infrastructure tool such as Hive to process structured data in Apache Hadoop
  • Transformed data by sorting, grouping, filtering, using SQL queries in Hive, generated stock dividends
  • Experience working with NoSQL, created a DynamoDB table and use the table to store and retrieve data using Python and Boto 3
  • Collected multiple 10-year stock market index data (S&P 500 Index, CSI 300 Index, IC 500 Index, etc.) with Bloomberg
  • Migrated multiple historical stock data into SQL database
  • Pre-processed data for trading strategy evaluation using MySQL, consolidated data from separate tables into one table with a distinct symbol indicator for each stock, for easier evaluation of trading strategies
  • Analyzed periodic trends and computed 5-day and 30-day moving average from the base stock data, smoothed out price action by filtering out the noise from short-term price fluctuations using SQL window functions
  • Visualized market index data using Tableau and Python Matplotlib
  • Created Tableau dashboard and reports to show periodic trends, candlestick charts, index price changes, index sector weightings, market caps, etc.
  • Backtested rebalancing strategies that alternate long-short positions of two stock indexes based on relative cumulative rate of return utilizing Python
  • Analyzed the metrics, including returns, Sharpe ratio, volatility, maximum drawdown, etc to test the performance of the strategy with Python, achieving an annualized return of 17%
  • Optimized the rebalancing strategy by tuning parameters of buy/sell signals, tested the sensitivity of returns to parameters, increasing returns by 0.8%
  • Collected 10-year Exchange-traded fund (ETF) data with Bloomberg, analyzed and priced ETF options using Black-Scholes Models in Python
  • Used Greeks (Delta, Gamma, Theta, Beta, etc) to analyze market volatility, risks and arbitrage
  • Conducted in-depth research on implied volatilities of various futures markets and explored the statistical properties of implied volatilities
  • Researched the ETF, plotted the 10-year daily chart, analyzed 5-day/30-day percentage price variation, summarized variations and visualized the results in Python
  • Implemented Black-Scholes Models (Python) to derive implied volatility and price call options, compared with real prices to evaluate the pricing model
Confidential, San Francisco, CA

Data Analyst/Business Analyst

  • Supported eBusiness team to enable easy access to data via web scraping, data mining, and helped design content-based recommendation (to predict recommendation for the product)
  • Collaborated with external partners to collect product data using Python
  • Utilized ETL to perform data extraction to produce cosmetics sales attributes data using SQL queries, including groupby queries, joins, subqueries, etc in MySQL
  • Built a simple recommendation system using Python in Jupiter Notebook based on previous cosmetics attributes
  • Conducted structure thinking on the problem and generated assumptions
  • Performed Exploratory Data Analysis (EDA), feature pivoting, and visualization on attributes data to identify trends and validate assumptions using Python Seaborn and Matplotlib
  • Performed univariate and multivariate analysis to test the previous assumptions using Python
  • Pre-processed raw data using ETL tools Python Pandas, and performed data cleaning, including missing data treatment, redundant values, inconsistent information and outliers removement
  • Transformed categorical values to numerical values for easier model application using Python Pandas with aggregate functions and lambda functions
  • Visualized cleaned sales data using Python Matplotlib and Tableau
  • Developed Tableau visualizations and dashboards with multiple panels and parameters using Tableau desktop to show histograms, scatterplots, boxplots, correlation tables, etc.
  • Updated MySQL database with cleaned brands data, developed stored procedures, tables, views, etc for easier processing of data
  • Explored feature distribution, performed feature selection and feature engineering on cleaned data, created dummy variables for easier model implementation in Python
  • Chose relevant classification methods, performed hyper parameter tuning to determine the optimal values for models, including Logistic Regression, Decision Tree, Random Forest
  • Implemented multiple classification methods to predict the recommendation level of beauty products using Python, obtaining an accuracy of 72.7%
  • Calculated metric scores for models (precision, recall, F1 score, etc) to see the performance using Python scikit-learn
  • Engaged A/B testing to optimize the recommendation system, involving measures such as click-through rate (CTR) and conversion rate (CR)
  • Analyzed results from the A/B test, generated assumptions, and conducted t-test to validate the assumptions
  • Explored reasons behind the results, and proposed further improvements
  • Created EC2 instance to deploy the images to AWS Cloud

Confidential, Santa Barbara, CA

Health Data Analyst

  • Collected Medicare 5% file (5% of national Medicare records, about 2.8 Million records) from Centers for Medicare and Medicaid Services (CMS)
  • Extracted sample dataset from Medicare 5% file, pertained 100,000 individual records of year 2012 and 2013 for benchmark analysis
  • Reorganized data, turned unique identification keys from claim ID to individual patients using SQL
  • Analyzed demographic and geographic data of patients, removed duplicated/redundant records, transformed variables according to algorithm needs using R
  • Mapped ICD-9 code (International Classification of Disease) to CC code, and then to HCC code (Hierarchical condition category ) to obtain risk coefficients in R
  • Applied Medicare risk adjustment model, calculated individual risk scores by applying sum of weights that reflects the health risk posed by different diagnoses
  • Implemented Medicare Shared Saving Algorithm and calculated savings to ACO in Excel and R
  • Simulated performance year data (2014) by multiplying a trend factor to existing data and generating necessary components in Excel and R
  • Calculated number of assigned beneficiaries, per capita expenditure, and average risk score of beneficiaries in each benchmark year and performance year using R
  • Calculated Risk Ratio between them to normalize the most recent benchmark year risk score to 1
  • Adjusted benchmark expenditure using national expenditure trend factor and risk ratio, calculated weighted average benchmark expenditure
  • Generated total benchmark expenditure in performance year with performance year risk ratio and growth increment
  • Compared actual total expenditure and total benchmark expenditure in performance year, concluded 9.9% of savings rate to ACO
  • Tuned parameters to optimize the results and test the sensitivity of savings in Excel

Confidential, Los Angeles, CA

Analyst Intern

  • Estimated annualized volatility for Facebook stock, computed prices of puts and compared with real prices
  • Constructed a Delta-Gamma neutral portfolio to hedge risks using Black-Scholes methodology in Excel, and estimated value of the portfolio corresponding to various stock prices
  • Utilized Monte Carlo Simulation to approximate price and Delta
  • Developed a Value-at-Risk (VaR) risk calculation system for portfolio of stocks and options under multiple window lengths in Python
  • Calculated VaR & ES for the user-specific portfolio with Monte Carlo, Parametric and Historical methods in Python
  • Trained the model using historical stock prices in Python, tested the results with actual VaRs to determine the limitations and future improvements of the model

We'd love your feedback!