We provide IT Staff Augmentation Services!

Data Analyst Resume

SUMMARY:

  • Passionate about machine learning, data analysis, and computer programming
  • Fast learner, excellent problem solving and communication skills.
  • Strong multi - tasking skills with challenging deadlines/requirements
  • Excellent team worker and collaborator

EXPERTISE & SKILLS:

  • Deep understanding of neural networks, machine-learning algorithms, and spatiotemporal data analysis methods
  • Good understanding of statistical modeling, statistical inference and data analysis
  • Proficient with data analysis/statistical tools: Python, R, SQL
  • Proficient with scientific programming in C/C++, Matlab (10+ years)
  • Working knowledge: SQLite, MySQL, Html, CSS, XML, JSON, Photoshop
  • Basic knowledge (currently learning): MapReduce, Hadoop, MongoDB, Pig, PHP, Java, Django
  • Good understanding of algorithms and data structures

PROFESSIONAL EXPERIENCE:

Confidential

Data Analyst

Responsibilities:

  • Logistic regression, decision trees, random forest, k-nearest neighbors, linear discriminant analysis, support vector machine, and combining all of them with majority vote to predict who would survive in the disaster, and considering to improve the accuracy by including more independent classifiers soon. packages used: (Python) numpy, pandas, scipy, sklearn, matplotlib etc.; (R) caret, randomForest, rpart, rattle, ggplot2, etc.
  • Extract and analyze information from tweets: downloaded huge amount (>100GB) of tweets from twitter.com (with Python), extracted tweets with disease related keywords (with Python and MongoDB), analyzed the geographical distributions of the tweets with different disease related keywords evolution (with both Python and R), and currently making predictive models to predict the trends of these distributions.
  • The goal of this project is to develop a web based application that can provide disease related trend information services.

Environment: Python urllib, oauth2, json, basemap, numpy, pandas, scipy, sklearn, matplotlib etc.; (R) googleVis, ggplot2, tm etc.

Confidential

Data Analyst

Responsibilities:

  • Currently building and comparing predictive text models including n-gram and Markov chain models.
  • To improve prediction accuracy, a spelling checker is currently being written, and will be tested very soon.
  • The ultimate goal of this project is to develop a predictive text product that can be used on devices with smart keyboard.
  • Using both decision tree and random forest models to predicted the manner in which people do physical exercises. It shows that random forest model has very high prediction accuracy.
  • Developed an expository shiny app, which is called IrisCDTree, and demonstrates classification of the flower iris with decision tree.
  • Wrote python code to access the twitter API, estimate the public's perception (the sentiment) of a particular term or phrase and analyze the relationship between location and mood based on a sample of twitter data.
  • Wrote SQLite (on local machine) queries to implement some in-database text analysis.
  • Wrote python code to design and implement MapReduce algorithms for some common data processing tasks.
  • Wrote R code to read and clean the data sets collected from the accelerometers from the Samsung Galaxy S smartphone. As the output, a tidy data set together with a CodeBook describing the variables, the data, and how I cleaned up the data are provided.
  • All files are available on my github repo. packages used: (R) XML, rhdf5, xlsx, jsonlite, httr, RSQLite, RMySQL, etc.

Hire Now