Data Analyst Resume
San Jose, CA
SUMMARY
- Experienced in data analysis and modeling for 5 years with strong technical and communication skills
- Developed and implemented databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality
- Interpreted data, analyze results using statistical techniques and provide ongoing reports
- Strong analytical and problem - solving skills, capable of addressing relevant facts, recommending solutions, working with teams and cross functional departments
- Experience in design, development, testing, implementation and support of Enterprise Resource Planning (ERP) and s trong understanding of business process flows
- Gathered and conformed project requirements by studying user requirement and and referring with others on project team
- Worked with both relational and non-relational database by using SQL and NoSQL syntax to query data and integrate with big data frameworks
- Proficient in querying data with MySQL, MSSQL, PostgreSQL etc. including build relational database, select and join tables with aggregate function, create view, function, and procedure
- Hands on experience in MongoDB, DynamoDBto build, retrieve, store and modify NoSQL database
- Worked on data preprocessing,exploratory data analysis and building Machine learning modelssuch as SVM, KNN, Random Forest, etc. by Python
- Analyzed data by building Machine Learning models for regression, classification and clustering with feature selection, feature engineering, hyper-parameter tuning, evaluation and validation by Python
- Experience using Tableau with database join, nested sorting, integration, visualization by creating diverse charts, maps, trend lines, and predictive analysis
- Hands on experience of big data framework Hadoop, including HBase for big data storage, data arrangement in HDFS, processing by MapReduceor Spark, and also data analysis by Hive
- Hands on experience in data mining, cleaning, warehousing, and ETLprocess by using Talend, Informatica, AWS Glue, Hive with big data frameworks
- Worked with cloud computing platform Amazon Web Service (AWS) and Google Cloud Platform (GCP) with various services including EC2, S3 storage, EMR, DynamoDB, BigQuery, Kubernetes, etc.
- Performed AB testing and hypothesis testing to delivery business insights
- Built statistical models and exploratory analysis by R Studio
- Experience with Deep Learning models for NLP and image recognition by using Python
- Reported and presented the data patterns and analytical results by building charts, graphs, and dashboards via MS PowerPoint and Excel to other team
- Experience with project management Agile methodology by Jira
- Used Git repository as version control
- High self-motivation, quick learner and adaptability towards trending tools and teamwork environment
TECHNICAL SKILLS
Programming Language: Python, R, SQL, Bash, Julia; JavaScript, STATA, MATLAB (Exposure)
Database: MongoDB, DynamoDB, Redshift, PostgreSQL, MySQL, MSSQL, Oracle, MariaDB
Packages: NumPy, pandas, Scikit-learn, TensorFlow, PyTorch, Keras, Caffe, seaborn, Matplotlib, SciPy, NLTK
Cloud Platform: Amazon Web Service (AWS), Google Cloud Platform (GCP)
Machine Learning: Linear Regression, Logistic Regression, Decision Tree, Random Forest, KNN, SVM, Gradient Boosting, Multi-Layer Perceptron, Neural Networks, Natural Language Processing
IDE: Jupyter Notebook, PyCharm, Visual studio code, Spyder, Atom, R studio
Big Data Platform: Hadoop, Spark, MapReduce, Hive, HBase
Analytic and Reporting Tools: Tableau, Power BI, Microsoft Office, SSRS, SSAS
Management Tools: Jira, Slack, Trello, Airflow
Operation System: Linux, Windows, MacOS
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Data Analyst
Responsibilities:
- Analyzed complex, high-volume, high-dimensionality data from multiple sources using different data analysistechniques and tools to formulate recommendations, learning and test plans
- Queried and analyzed relational databases containing millions of customer information using SQL syntax, integrated with big data frameworks to handle and analysis data, and built M achineL earning models for providing business insight
- Maintained NoSQL database on MongoDB to handle unstructured data, clean the data by removing invalidate data, unifying the format and rearranging the structure and load for following steps
- Run big dataframework Hadoop and interact with S3 for file storage on AWS EMR cluster
- Used Hive QL to fuse and aggregate the different datasets and finally load that data into database as a full ETL process with AWSGlue
- Built relational databaseson Amazon AuroraMySQL: created/altered tables, wrote stored procedures, triggered defined functions, implemented batch
- Used MySQL syntax such as join function, window function, subqueries, aggregate function to clean data and optimize the database performance
- D esigned story-telling visualizations and dashboards from different database to enable ongoingmonitoring and reporting at different departments using Tableau
- Maneuvered Python scripts with AWS EC2 to extract data from different departments, cleaned data by creating functions based on business logic and streamlined data processing with Pandas and NumPy
- ImplementedEDA, PCA, and Feature engineering to extract features, a pplied SMOTE and other resampling techniques to address the imbalanced data and improve F1 score within machine learning models in Python
- Developed Random Forest model and Gradient Boosting modeland p erformed hyperparameter tuning by using GridSearchCVby Scikit-learn package in Python
- Run A/B test for new features designed via different market channels
- Identified data patterns and visualized model results by connecting Tableau with Python and MySQL
- Reported Tableau dashboard with different chartsand presented with MS PowerPoint
- Used Git as version control and Jira for the team-wide management methodology
- Summarized the information andreports to deliver the insights for team and client
- Implemented the design, analysis, and interpretationof a variety of reports and analytical solutions
- Engaged constructively with project teams to support project objectives through the application principle
Confidential
Data Analyst
Responsibilities:
- Monitored and analyzed customer information data and sales data to grasp product and market trend
- Provided data-driven insights to enable decision-making for product and market development by interpreting data, analyzing results and using statistical techniques to provide ongoing reports
- Created effective marketing strategies and promotion plans to retain existing high-value customers
- Developed intuitive KPI dashboards in Tableau for senior management that provided insight into the performance of department strategies
- Worked with SQL to manipulate relational databases containing millions of customer insurance’s information data, and built Machine Learningmodels to predict premiums, risk level, etc. with big data frameworks for delivering business insight
- Participated in NoSQL database maintaining with MongoDB, Cloud Bigtable for unstructured data manipulating and extracted data from different source
- Extracted and manipulated large-scale data in Hadoop and Spark environment
- Worked with HiveQL script to process the data stored in HBase and run MapReduce jobs
- Used Google Cloud and Snowflake for ETL process, data warehousing and large-scale computing
- Designed relational database and storage in MariaDB (MySQL),m aintained and optimized database by creating/altering tables, writing stored procedures, triggering defined functions, implementing batch
- Fetched relational data by using SQL to query and merge over millions of data from different tables
- Used Tableau connecting with different database to create interactive charts and trend linesfor presenting analytical results and providing customer segmentation suggestions
- Identified trends and patterns of the data through EDA processin Python providing helps to engineer team and sales team for further analysis
- Feature engineering to extract features also visualized by matplotlib and seaborn in Python
- Determined key factors by feature selection process, such as PCA, affecting potential risk level
- Applied Machine Learning modelsincluding Gradient Boosting and SVM and evaluated models
- Built unsupervised machine learning modelssuch as k-means for customers segmentation and personalization in Python
- Run A/B test for new features designed via different market channels for product researching
- Worked with R for statistical modeling like Bayesian and hypothesis testwith dplyr and BAS packages, and v isualized testing results in R to delivery business insight
- Model validation by Confusion Matrix, ROC, AUC, and developed diagnostic tables and graphs that demonstrated how model can be used to improve the efficiency of the selection process
- Presented and reported business insights by SSRS and Tableau dashboard combined with different diagrams
- Utilized Jira as project management methodology and Git for version control to build the program
- Reported and displayed the analysis result in the web browser with html and JavaScript
- Involved constructively with project teams, supported project’s goal through principle and delivered the insights for team and client
Confidential
Data Analyst
Responsibilities:
- Collaborated with product team to develop database with MySQL, establish data analysis by using Microsoft Excel and Python, and delivery analysis for business insight by different visual format
- Used Microsoft Excel to clean the data and explore the data features by using sorting, filter, conditional formatting, charts, and pivot tables
- Analyzed data sources and formats via different tools, build relational database, and modified the schema by MySQL
- Queried and merged data by removing the invalidate data, unifying the format and rearranging the structure in MySQL
- Performed exploratory data analysis, results interpretation, and report preparation in support of business process management and supply chain cycle
- Identified the data pattern, feature importance by Python with pandas, NumPy, and sklearn
- Built models such as linear regression and logistic regression with different features and make predictions for advising supply chain management
- Adjusted performance by feature tuning with GridSearchCV to improve the MSRE
- Visualized the results by Python with matplotlib, seaborn, and also Data Studio to delivery insights
- Interpret data and make conclusion that are presented in a visual format by Microsoft PowerPoint to management team
- Reported and displayed the analysis result in the web browser with html and JavaScript
