Data Analyst Resume
Redwood City, CA
SUMMARY:
- In - depth experiences in data analyst in different sectors such as e-commerce, bank, and fund company with professional knowledge in database and data analysis tools
- Solid experiences in using SQL, Python, R, Tableau, MATLAB, Excel, AWS, and Google Cloud
- Worked on relational database in MySQL, PostgreSQL, and non-relational database in MongoDB, DynamoDB
- Experience of creating tables and views, writing complex SQL queries, stored procedures, and functions in MySQL
- Used Python to do ETL and data manipulation process for MySQL database
- Hands on experience of creating complex NoSQL database in MongoDB and connecting PyMongo to do ETL process by Python
- Worked on data cleaning, data mining, and data wrangling with unorganized, inconsistent, and incorrect data by NumPy, Pandas in Python
- Hands on experience applying Machine Learning models, exploring hyperparameter and evaluating ML models
- Experience of building statistical model and machine learning model in Python with Statsmodels and Scikit-Learn
- Implemented logistic regression, random forest, cross validation, and k-nearest neighbor ML model in Python
- Experience of creating data visualization with Matplotlib, Seaborn and Plotly in Python
- Hands on experience of using PySpark in Python to do big data analysis
- Worked on statistical analysis like hypothesis test, linear regression, logistic regression, and time series in R
- Created data visualization in R with ggplot 2 and generated reports with R Markdown
- Experience of connecting various local and live data source such as excel, json, and MySQL database in Tableau
- Used Tableau to draw various charts and maps and gnerate reports with various sheets, dashboards, and stories
- Experience of applying ML model such as Lasso Model, SVM, and SVD in MATLAB
- Worked on pivot table, vlookup, and functions to do data analysis in Excel
- Hands on experience of using workflow tools like Jira to view, manage, and report for work
- Had excellent written and verbal communication skills
- Experience of cooperating with multi groups to communicate and negotiate for the projects
- Combine patience, determination, and persistence to troubleshoot client issues and strong problem-solving and analytical skills
TECHNICAL SKILLS:
Programming Language: Tableau, SQL, Python, R, MATLAB, Excel, Hive, Spark
Database: MySQL, PostgreSQL, MongoDB, DynamoDB
Big Data: Hadoop, Hive, Pig, Spark
Reporting Tools: Tableau, PowerPoint, Google Slides
Packages: NumPy, Pandas, Seaborn, Matplotlib, Sklearn
Analysis Methods: A/B testing, Multiple Linear Regression, Logistic Regression, time series analysis, K-nearest neighbor clustering, cross validation, and hypothesis testing
Language: Mandarin (Native), English
EXPERIENCE:
Data Analyst
Confidential, Redwood City, CA
Responsibilities:
- Transferred all kinds of warehouse product data and transportation data from different database like MySQL and NoSQL, Mango DB into Hadoop by Sqoop to improve the big data store availability
- Contributed to reduce ETL process time and by deploying Hive and Spark into Hadoop framework based on AWS EC2 to improve the data processing efficiency and used PySpark to load and process large-scaled data in Python
- Wrote complex SQL queries with nested subqueries, aggregation functions, and window function to improve the running efficiency on large datasets, and reduce the duplicated and unknown data in warehouse product data and transportation data
- Used Python to extract recent product data in MySQL database, and performed data munging and data cleaning such as merge tables, rename variables, change data type, and create new features with NumPy, Pandas, and Datetime
- Created Exploratory Data Analysis (EDA) with visualization packages like Seaborn, Matplotlib, and Plotly in Python to explore datasets, draw plots, found possible trend behavior, critical relations between demand and supply
- Created several different metrics to define the effectiveness of reaction in supply chain issues and connected Live MySQL database to Tableau for further analysis
- Generated reports with interactive dashboards and various charts to illustrate the effectiveness of reaction in supply chain issues, demand fulfillment, and inventory in Tableau and discussed with team manager to optimize the supply chain and inventory
- Based on previous demand data to predict the future demand curve by data in Machine Learning model such as linear regression, random forest, and classification tree with Statsmodels and Scikit-Learn in Python in order to improve the supply chain and optimize the inventory
- Used Python to generate correlation matrix and compare to the significance level for each feature in the linear regression model, in order to find the top 5 features which influence the demand curve
- Run the A/B Test in R for different transportation methods to evaluate the demand fulfillment and supply chain efficiency
- Used RMarkdown in R to generate statistical reports about distribution of demand and supply data such as normal, gamma, and exponential distribution
- Designed dashboards for the 5 features in Tableau to present my findings and results to team manager and remodified the prediction model with their suggestions in Python
- Supplied troubleshooting, analysis, and solutions for warehouse and supply chain data providing issues
- Contributed to find supply chain and business problems that are causing concern, and suggested improvements to cut costs and improve the entire process
- Met with company executives to make recommendations based on careful research and predictions during a monthly Sales and Operations Planning (S&OP) planning process
- Collaborated with many colleagues from various departments - operations, sales, design, production, marketing, customer service and project management teams, in order to address consumer inquiries, analyze customer needs and boost sales.
- Communicated with vendors in order to address problems, negotiate better deals and form relationships
- Trained new employees on effective data entry techniques
Data Analyst
Confidential, Santa Clara, CA
Responsibilities:
- Operated structured and unstructured data from many different sources such as MySQL, NoSQL, Excel files, and transferred them into MS SQL Server with the ETL tool SSIS
- Improved data processing efficiency by simplifying complex SQL queries with advanced functions like window function, aggregation function, and nested subqueries
- Used R to check the statistic distributions such as beta, log-normal, normal, gamma, exponential and tried to find any possible patterns for the customer rating data
- Applied ggplot2 and Esquisse in R to create fancy data visualization such as Q-Qplot, geom density, and distribution plot, in order to find possible trend, relations, and behavior of customer rating data
- Used abtest library to run the A/B Test in R to decide the top 3 features that influence the customer rating by randomly split the test group with similar customer characteristic like sex, age, and cost, and then, created new metrics to define the customer rating and tested the metrics with A/B Test
- Implemented the Machine Learning model such as random forest, regression tree, and linear regression to predict the customer rating and ran the Recursive Feature Elimination function in R to use the computer power searching the best results and improve the accuracy
- Used smartEDA library in R to compute the Exploratory Data Analysis (EDA) and integrated the outcomes into r-markdown reports
- Generated statistical report in R Markdown format with RMarkdown and Knitr libraries in R and presented the reports to the team manager
- Combined MS SQL and Tableau to create and modified Tableau worksheets and dashboards by performing Table level calculations like window functions with diversified analytics such as lines, average lines, forecasting, trend analysis and distribution band
- Created clear Power Point with data visualization results from R and Tableau, and presented to customer service team
- Worked with Agile team and Waterfall procedure to tailor each part of project within short period
- Designed and developed data analytics dashboards for customer retention, acquisition and satisfaction, sales revenue, and volume growth to help inform financial targets, business development, and sales incentive programs
- Worked with the programming team to improve the internal system for optimal efficiency
- Communicated with vendors for accurate product information, upcoming products, and pricing policies
- Partners with User Experience Teams to develop enhanced customer experiences
- Conceptualize best-in-class digital experiences to document the connected relationship between touch point and the full end-user experience
- Developed and managed End-User Testing Strategies to identify failures throughout the user experience, and utilized collected data to drive assumptions, recommendations for reporting of results
Business Analyst
Confidential
Responsibilities:
- Used Beautiful Soup for screen-scraping in Python to gather the Chinese car industry data from different car selling website
- Extracted and Chinese car industry stock data from Wind system and loaded into Python for further data manipulation and data cleaning
- Transferred the selling data and stocks data into relational database MySQL for more efficient data storage and process by Python
- Contributed to improve and automatic updating store procedures in MySQL by improving the SQL queries with more advanced functions like window function, aggregation function, and nested subqueries
- Generated data cleaning and data wrangling with Numpy, Pandas, and Datetime in Python to merge data, change data type, remove duplication and null values, and check outliers
- Created EDA (Exploratory Data Analysis) in Python by Seaborn, Matplotlib, and Plotly for interactive data visualization, in order to find possible trend, relations, and behavior for the car selling and stock data
- Simulated one-week Chinese car stock data with current Barra model in Python and tried to modify the alpha and beta factors for better fit with the actual data
- Loaded clean data into Excel and used SmartArt, pivot tables, and functions to create various plots and built statistic models to generate reports for product manager
- Created interactive Power Point with the results, tables and plots in Excel and Python for presentation
- Met with vendors to discuss the important factors in Chinese car industry
- Prepared the presentation for the new factors and illustrated to the team manager
Data Analyst
Confidential, Sacramento, CA
Responsibilities:
- Managed customers’ credit card transaction data from different data source like excel files and json files, and transferred them into MySQL database for better storage and process performance
- Wrote complex SQL queries including window function, aggregation function, and nested subqueries to extract the data
- Loaded current month credit card transaction data into Python to do data cleaning, and data wrangling like merge tables, fill null values, and check outliers with packages NumPy and Pandas
- Applied Exploratory Data Analysis and created data visualization with Seaborn and Matplotlib in Python to find important features which influence the possible fraud payments
- Performed statistical analysis for the credit card transaction data such as T-test, ANOVA, and F-test with Statsmodels in Python
- Built interactive dashboard in Excel with Pivot table, Vlookup, SmarArt and reported insight to manager
- Generated reports in MS Word with the graphs, tables, and findings from Python and Excel
- Created interactive Power Point slides according to reports of fraud detection for presentation to other product teams
- Drafted and prepared bilingual (English and Chinese) marketing materials including pitch deck, teasers, and investor presentation
- Provided execution support, researched public filings, market-industry and databases to compile company profiles; prepared pitch books, memorandums, market updates, Public Information Books and PowerPoint presentations for senior bankers and clients
