Senior Data Analyst Resume
Philadelphia, PA
SUMMARY
- Highly experienced Data Scientist with over 6 years’ experience in Machine Learning, Statistical Modeling, Data Mining and Data Visualization.
- Rich domain knowledge and experience in E - commerce, IT and Real Estate industries.
- Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
- Proficient in Machine Learning algorithm and Predictive Modeling including Regression Models, Decision Tree, Random Forecast, Sentiment Analysis, Naïve Bayes Classifier, SVM, Ensemble Models, Neural Network.
- Proficient in Statistical Methodologies including Hypothetical Testing, ANOVA, Time Series, Principal Component Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis.
- Experienced in designing and implementing marketing experiments such as A/B test and A/B/C test.
- Proficient in ETL (Extraction, Transformation and Loading) process by using Web Crawling, Data Mining, Text Mining.
- Proficient in Python 2.x/3.x with SciPy Stack packages including NumPy, Pandas, SciPy, SymPy, Matplotlib, IPython.
- Expertise in statistical programming language R 3.x with packages such as R shiny, ggplot2, gmodels.
- Expertise in statistical programming language SAS 9.x including SAS SQL, Macros, and Optimizations.
- Extensive experience in RDBMS such as MySQL 5.x, SQL-Server 2010+ and NoSQL databases such as MongoDB 3.x, Cassandra 3.x, and HBase 0.98.
- Working experience in Hadoop 2.x ecosystem and Apache Spark 2.x framework such as HDFS, MapReduce, HiveSQL, SparkSQL, PySpark.
- Proficient in data visualization tools such as Tableau 9.x, Python Matplotlib, R Shiny and D3.js 4.x to create visually powerful and actionable interactive reports and dashboards.
- Experienced in version control tools such as Git 2.x, GitHub.
- Experienced in Agile methodology and SCRUM process.
- Great passion in data manipulation and learning cut-edged algorithms/models for Machine Learning and Artificial Intelligence.
- Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
- Successfully working in fast-paced multitasking environment both independently and in the collaborative team, a self-motivated enthusiastic learner.
TECHNICAL SKILLS
Statistical Methods: Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation, Simpson’s Paradox.Languages Python 2.x/3.x, R 3.x, SAS 9.x, SQL
Databases: MySQL 5.x, SQL-Server 2010+, MongoDB 3.x, Cassandra 3.x, HBase 0.98
Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forecast, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN, Ensemble Method.
Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS
Packages Python: Pandas, NumPy, SciPy, SymPy, Scikit-Learn, Matplotlib.
R: ggplot2, gmodels, R Shiny. HiveSQL, SparkMlib, SparkSQL
Data Visualization: Tableau 9.x, D3.js 4.x, Python Matplotlib, R Shiny
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
Data Scientist
Responsibilities:
- Participated inDataAcquisition with Data Engineer team to extract historical and real-time data that refers to search parameters and customer information, by using MapReduce and HDFS.
- Performed Data Preparation and cleaned the unstructured data by using Data Mining to deal missing value, to remove spaces and hash tags, and to analyze correlations.
- Explored and analyzed the customer specific features by using SparkSQL.
- Built Factor Analysis and Cluster Analysis models using Python SciPy to classify hotels into different target groups and to identify newly effective features.
- Built predictive models including Support Vector Machine, Decision Tree and Naïve Bayes Classifier using Python Scikit-Learn to predict the personalized hotel group for each user.
- Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify the models’ significance.
- Designed an A/B experiment for testing the business performance of the new recommendation system.
- Created reports and dashboards, by using Python Matplotlib and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
- Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
- Used Agile methodology and SCRUM process for project developing.
Environment: s: Hadoop 2.x, MapReduce, HDFS, SparkSQL, Python 3.x, Matplotlib, Scikit-Learn, Tableau 9.x, SVM, Decision Tree, Naïve Bayes Classifier, A/B experiment, Git 2.x, Agile/SCRUM.
Confidential, Philadelphia, PA
Data Scientist
Responsibilities:
- Collaborated with Data Engineer to collect external data sources by using Web Crawling and Data Mining.
- Implemented ETL process and Data Cleaning for both the internal and external data sources through Python Pandas and NumPy.
- Identified and selected the effective features by using Principal Components Analysis and KNN by using Python SciPy.
- Built predictive models including Regularized Linear Models, Lasso Model, Random Forecast to predict apartment price by using Python Scikit-Learn.
- Developed Ensemble Model using R gmodels to combine multiple predictive models and their predictions for improving the prediction accuracy.
- Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify each predictive model.
- Conducted Root Cause Analysis and Factor Analysis on digital marketing performance to identify KPIs for email advertising by using Python SciPy.
- Researched and developed Attribution Models using Python Scikit-Learn to find the correlation between the conversion rate and email advertising campaigns.
- Evaluated and recommended the optimized time frequency and time duration for email advertising campaigns.
- Used Tableau 9.x, R Shiny to create detail level summary reports and dashboards to technical and business stakeholders, by using KPI's and visualized trend analysis.
- Used Agile methodology and SCRUM process for project developing.
Environment: s: R 3.x, Shiny, gmodels, Python3.x, Scikit-Learn, Web Crawling, ETL, Root Cause Analysis, Factor Analysis, PCA, KNN, Statistical Tests, Ensemble Model, Regularized Linear Models, Lasso Model, Random Forecast, Attribution Models, Tableau 9.x, Agile/Scrum.
Confidential
Senior Data Analyst
Responsibilities:
- Managed data engineering and counseled for 10+ Confidential News-App to drive data solutions on every aspect of the App.
- Initiated multiple data projects, including building models, algorithms, and tools to improve the user experience.
- Provided advertising implementation, effective clicking and exposure measurement data tracking solutions.
- Transformed unstructured and structured data on Hadoop, Hive, SQL Server 2010+, MySQL 5.x, MongoDB 3.x.
- Designed and Implemented ETL and statistical analysis with R 3.x, Python 3.x.
- Created visualization reports and dashboards with Tableau 9.x, ggplot2.
- Used data mining with Regression, Decision Tree, K-means, Machine Learning Algorithms on daily billions of massive users’ online behavior logs.
- Provided report and dashboard solutions for product operation, A/B Test, KPI measurement etc.
- Designed data infrastructure for marketing campaign data tracking system and open course program.
- Implemented User experience analysis, such as focus group, survey, card sorting, interview etc.
Environment: Hadoop 2.x, Hive 0.13, SQL Server 2010+, MySQL 5.x, MongoDB 3.x, Python 3.x, R 3.x, Tableau 9.x, ggplot2, Regression Models, Decision Tree, K-means, A/B Test.
Confidential
Data Analyst
Responsibilities:
- Participated in the team’s core project ‘Decision Center’, responsible for data logic analysis and validation.
- Manipulated big data of daily trades and punishment records with SAS 9.x.
- Validated data formats, analyzed features correlations, performed ad-hoc analysis.
- Conducted predictive models with data mining algorithms in Python 2.x programs.
- Designed and executed tests and verifications to verify the models’ significance.
- Created data analysis reports as to present data insights and interpretations to business operators.
- Developed weekly data reports for the team independently, designed and monitored significant indicators for the entire website.
Environment: Python 2.x, SAS 9.x, Predictive Models, Data Mining, Statistical Analysis.
Confidential
Database Administrator
Responsibilities:
- Interacted with Business Support Analysts, Operations Managers and Directors, and Executives to solve data needs.
- Participated in project management teams as needed to provide technical support to the assigned project.
- Provided detailed monitoring of jobs and logs for MySQL 5.x and SQL Server 2010+.
- Monitored databaseand operating system troubleshooting and problem resolution to SQL environments
- Conducted data Import/Export and loading using direct SQL.
- Maintained and improved the databases to include rollout and upgrades.
Environment: MySQL 5.x, SQL Server 2010+, ETL, ER diagrams, SQL queries.