Sr.data Scientist /assistant Research Professor Resume
4.00/5 (Submit Your Rating)
PROFESSIONAL SUMMARY:
- Expert in quantitative data analysis, predictive modeling, data management and interpretation.
- Proficiency with many machine learning algorithms, statistics, physics and mathematics.
- Strong experiences in data analytics, data engineering, data validation, and data mining.
- Deep knowledge in big data technology, neural networks, Tensorflow, NPL.
- Critical thinker, fast learner, self - motivated, able to work within deadline.
COMPUTER SKILLS:
- Proficiency with Python (Numpy, Pandas, Sklearn), SQL, R, C/C++, Perl, Excel,
- Strong experience with Unix/Linux shell scripts. Work in the cloud environment.
- Experience in Hadoop, Spark, Mapreduce, Jupyter, Github, HTTP/Apache, SAS, Tableau.
PROFESSIONAL EXPERIENCE:
Sr.Data Scientist /Assistant Research Professor
Confidential
Responsibilities:
- Applied many machine learning algorithms (such as Decision tree, Random forest, GBoost, k-NN, Naive Bayes, SVM, Logistic regression, neural network) for predictive modeling.
- Applied regression algorithms to accurately predict protein/DNA data quality at the Confidential .
- Applied many classification algorithms to predict HR disease and banking behavior.
- Applied the Principle Component Analysis (PCA) to simplify visualization of biological data quality.
- Applied non-linear regression to accurately predict data growth at the Confidential .
- Applied multivariate linear regression for Marketing Mix Modeling (MMM).
- Developed a python module to automatically select the best Machine Learning algorithm and the best Hyperparameters from the Scikit-Learn library for predictive modeling.
- Design and developed software tools for various data cleaning, validation, and data mining.
- Developed a new algorithm to detect anomalous data with high accuracy and performance.
- Created a relational database (MySQL) and developed Python scripts to query data for statistical analysis.
- Designed and developed the PDB Distro, a web-based statistical tool to calculate univariate data distribution probability, multivariate data correlation, and the outliers. Thus, provided insights into big data quality and usability.
- Lead a team to develop a Drug Design Data Resource (HR data) pipeline in support of computer-aided drug design and discovery (collaborated with Novartis, Roche, Johnson & Johnson, Genentech).
- Taught and instructed the bio-curators for detecting and correcting various data errors.
Senior software developer/Research Associate
Confidential
Responsibilities:
- Designed and developed the PDB extract, a bioinformatic user-friendly software tool for unstructured data extraction, collection, integration, and format standardization. This tool is currently used by over 40,000 worldwide researchers.
- Developed Confidential software and applied a statistical approach to validate the model against the experimental data, thus solving the problem of inaccurate data being used for drug design.
- Developed the software RNAview to classify nucleic acid base pairs, RNA motifs and display of secondary structure with full hydrogen bond interactions.