We provide IT Staff Augmentation Services!

Data Analyst Resume

San Francisco, CA

SUMMARY:

  • 5 years industrial working and academic research experience focused on big data analysis, data mining, statistical inference and modeling, machine learning, data visualization and data pipelines
  • Experienced with data science toolkits, machine learning techniques and modeling tools such as Python libraries like NumPy, Pandas, SciPy, Scikit - Learn and Matplotlib
  • Strong in constructing SQL Queries , Stored Procedure , Views , OLAP and performed ETL process
  • Experience in developing python scripts for automation and data validation
  • Experienced in Data Warehousing and ETL process and manipulated data both from transactional data bases (Oracle, SQL) and Hadoop based storage systems (such as AWS Amazon S3, HDFS, Redshift)
  • Hands-on experience on Python, SQL and UNIX/Linux to perform statistical analysis and to implement machine learning algorithms utilizing different packages
  • Working experience in RDBMS like MySQL, PostgreSQL, MS SQL Server. Strong knowledge of database concepts and both SQL and non-SQL databases such as MongoDB and Redis
  • Experience in data analytics, business research, and project management that involves in creation, and guidance in the data analytics process
  • Analyzed complex business process and translated them into technical requirements to be used in the development of reports.
  • Statistical analysis professional applied techniques such as A/B testing , hypothesis testing and sample ANOVA and Design of Experiments to explore data insights
  • Created dashboards, Pie charts, histogram, interactive plots using Tableau
  • Hands-on experience of implementing Big Data application, MapReduce framework to handle big data processing, using Hive to read/write data with HiveQL
  • Experienced with using MS Excel , Python (Seaborn package), R (ggplots) for Data visualization
  • Experienced in R packages for machine learning techniques including rpart, caret, ggplot, R shiny
  • Experienced in Machine Learning techniques Linear regression, Lasso regression, Ridge Regression, Logistic regression, Clustering Analysis, Decision Tree, Random Forest, Naïve Bayes, Support Vector Machines, Principal component analysis
  • Experienced using Google Cloud Platform (Kubernetes, Kubeflow, BigQuery) to execute queries
  • Experienced with Flask for Web application and experience with Docker and Apache Airflow
  • Experienced in Agile development techniques and Git version control

SKILL:

Programming Languages: SQL, Python, R, SAS

Packages: Numpy, Pandas, Scikit-Learn, Statsmodel, Seaborn, Matplotlib

Database: MySQL, PostgreSQL, MS SQL, MongoDB, AWS Redshift

Cloud: AWS, Google Cloud Platform

Big Data Technologies: Apache Hadoop, Apache Spark, PyHive, PySpark, Apache Kafka

WORK EXPERIENCE:

Confidential, San Francisco, CA

Data Analyst

Responsibilities:

  • Performed Exploratory Analysis using Python , data transformations (scaling and missing value imputation), feature selections and data visualization
  • Connect with MySQL database and retrieved the dataset from MySQL database to CSV files in an AWS S3 bucket, granted bucket and raw datasets permissions
  • Designed the risk fraud pipeline including machine learning model and risk analysis for the financial institution to detect credit card default from transactional databases
  • Handled large volumes of data including both structured and unstructured data of customer using NoSQL like MongoDB
  • Used MySQL databases to store and manipulated structured data. Constructed MySQL queries such as multiple joins, windows functions and analytics function ( Rank & Dense Rank ) to extract, transform and load ( ETL ) data
  • Created Tableau dashboards and developed Tableau data visualization using Scatter Plots, Geographic Map and Bar charts and Density charts to get a better understanding of data
  • Used matplotlib in Python to generate plots, histograms, bar charts, time-series charts, heat map
  • Built connections with the data sources on AWS S3 and converted them into RDD s using PySpark in Python
  • Constructed Python scripts to automate data validation and exploratory data analysis process such as detecting data anomaly and trend analysis using Python
  • Used Hadoop, HDFS, MapReduce, Hive, Pig and Spark to manage data processing and storage for big data applications running in the clustered system
  • Implemented unsupervised machine learning model which return all transactions that are worth being investigated further to capture the new risk pattern (sudden clusters of unusual activities) to make a prediction
  • Built the function to automate the list of customers who went above their credit card monthly limit on a specific day using NumPy and Pandas (Dataframe & Data structure & Lambda) of Python
  • Developed machine learning models including Logistic Regression and Random Forest to predict credit default and produced an accuracy of 83.7%
  • Evaluated model performance via cross-validation and used grid search to tune hyperparameters which improved prediction accuracy by 3%
  • Participated in requirement analysis with the engineering team, provided estimation, defined scope based on the capacity and communicate with the stakeholders
  • Reviewed daily reports for necessary corrections to meet data integrity and made necessary data updates and/or sent notification to the appropriate departments
  • Used Git to coordinate works on with different versions with other peoples and Jira to track and resolve issues

Confidential, Houston, Texas

Data Analyst

Responsibilities:

  • Designed and maintained MySQL Database, and created view , user-defined function and stored procedure for daily tasks
  • Used Python (Pandas, Numpy, scikit-learn, Scipy) to conduct data importing, cleaning and preprocessing, models tuning and optimization
  • Performed Exploratory Data Analysis for various datasets using Tableau and Python packages such as Matplotlib and Seaborn to get the distribution of age, loan purpose, account balance and yearly salary from loan information
  • Implemented machine learning model including Random Forest and Logistic Regression to predict loan grades
  • Performed data visualization such as line charts, pie charts and generated advanced Tableau dashboards with quick filters by Tableau Desktop
  • Performed feature engineering transforming numerical data to categorical data using Ordinal Encoding techniques using Python
  • Built data pipelines to aggregate and clean data by PySpark (Filter, groupBy, Split), wrote Python scripts (user-defined functions) to automate the data processing steps
  • Wrote MySQL queries such as multiple joins, Window Functions and nested subqueries to examine the optimal time complexity
  • Used MySQL to create views and procedures and functions to manipulate structured data
  • Analyzed loan information data using Pandas and Numpy package to manipulate DataFrames and data structure
  • Dealt with imbalance datasets for more accurate prediction in models by using undersampling techniques
  • Formatted the data from MySQL and MongoDB database using Python, then concatenated them together to facilitate Machine Learning analysis with PySpark and Python
  • Validated predicted model using cross-validation and tested models in testing dataset to prevent overfitting using Python packages Scikit-learn
  • Developed intuitive KPI dashboards in Tableau for senior management that provided insight into the performance of department strategies.
  • Provided data-driven insights to enable decision-making for product team and market development

Confidential

Data Engineer

Responsibilities:

  • Built from the concepts of Entity - Relation Models, created and modified database schema with foreign key migration rules
  • Excellent knowledge on Normalization (1NF, 2NF, 3NF) and De-normalization techniques to improve database performance in OLTP, OLAP and Data Warehouse
  • Used MongoDB database to store and manage unstructured data. Wrote queries in MongoDB to delete, find and update records used PyMongo
  • Wrote complex MySQL queries such as multiple joins, nested subqueries and windows functions and common table expressions to track user activity metrics such as daily active user and monthly retention rate
  • Developed dashboard in Tableau to visualize user activity metrics via histogram, pie charts and line charts
  • Optimized existing MySQL queries to have 30% less running time on large datasets
  • Worked with Pig and Apache Hadoop to extract and load unstructured data from MongoDB database and group, sort and count records to explore the time and categorical data
  • Configurated environment and installation of AWS CLI to control various AWS services through SHELL/Bash
  • Used PySpark with Python and Apache Spark to apply Python codes to user data stored in HDFS
  • Provided cross-functional analytic insights for Marketing Strategy, Media & Public Relations, Operations, Creative, and Marketing teams

Confidential

Data Engineer

Responsibilities:

  • Designed A/B testing , calculating sample size and checking statistical assumptions for statistical tests using R
  • Performed statistical analysis such as hypothesis testing and regression analysis and confidence interval calculation using R to find insights to increase click-through rate
  • Constructed exploratory analysis with NumPy and Pandas library in Python and built Tableau charts with connected MySQL and MongoDB database to get some insights of the datasets
  • Used Tableau to create line, pie charts, line charts and interactive charts to visualize data
  • Developed dashboards reports and visual presentations using Tableau and MS PowerPoint for internal and external audiences
  • Extracted additional features from MongoDB database with PyMongo in Python , transformed and restored the data into CSV files, moved the files into AWS S3 bucket
  • Built an interactive web application with Flask library from Python which allowed users to upload data
  • Interpreted statistical test result and model results and communicate key findings to the market team to facilitate data-driven decision making
  • Delivered higher quality analysis and optimizations to their marketing campaigns, resulting in greater value to our clients

Hire Now