Senior Data Scientist Resume

Cincinnati, OH


  • 7 years of progressive experience in software development, Data Engineering, Natural Language Processing and Machine Learning.
  • Experience in developing predictive models using supervised and unsupervised Machine Learning algorithms by collecting and cleaning structured and unstructured datasets.
  • Strong understanding and experience in implementing supervised/unsupervised Machine Learning algorithms such as Naive Bayes, Linear Regression, Random Forest, Generalized Linear Models, Support Vector Machines, Neural Networks, Ensemble Learning, Clustering and Topic Modeling.
  • Expertise in feature extraction, ranking, and dimensionality reduction tasks. Experience in utilizing techniques such as Principle Component Analysis, Information Content, SVD for feature engineering tasks.
  • Expertise in handling bias and variance trade - off, evaluating fit of the model with respective measures of the chosen algorithm.
  • Experience in performing Natural Language Processing tasks such as tokenization, stemming, POS tagging, negation detection, triple and entity extraction.
  • Experience in developing and utilizing domain specific knowledge graphs to create semantics-enabled features and train predictive models.
  • Experience in implementing Deep Learning models for classification and regression problems.
  • Experience in implementing simulators with prescriptive analytic features to suggest best actions to the end user.
  • Good understanding and experience implementing personalization and recommendation system features. Researched and developed knowledge enabled personalization algorithm for social media platforms and entity recommendation applications.
  • Experience in developing distributed applications using Hadoop, Spark (MLib) and Storm.
  • Experience implementing Object Oriented Programming concepts, design patterns such as Factory, Singleton, Builder etc. in Java/J2EE.
  • Developed web applications using Google Web Toolkit, Servlets, JavaScript, HTML, CSS, AngularJS framework and deployed them on Google Cloud Platform.
  • Experience in creating interactive visualizations/dashboards using Shiny.


Programming Languages: JAVA, J2EE, C, R, Python

Database Skillset: Data Modelling (ERD), Database design and Implementation, SQL, MySQL, Oracle 11g, MS-SQL Server 2012, Virtuoso (Graph DB), MongoDB

Machine Learning & NLP: H2O.ai, CARET, scikit learn, MLib, Weka, Stanford NLP, NLTK

Big Data Tools: Hadoop, Apache Spark, Storm, AWS, MS Azure

Visualization & Web: Shiny, Tableau, Qlik View, HTML5, CSS, Angular JS, GW


Confidential, Cincinnati, OH

Senior data Scientist


  • Responsible for developing healthcare analytics platform predictive capabilities such as identifying the individuals at risk of disease progression or onset by using Knowledge Graphs, Machine Learning, Natural Language Processing techniques.
  • Implemented distributed data processing pipeline to clean collected large volumes of structured and unstructured clinical data using Hadoop stack.
  • Developed predictive models using algorithms (H2O.ai, CARET libraries) such as Logistic Regression, Random Forests, Neural Network and Ensembles to identify individuals at risk of Diabetes, CHF, COPD etc. to enable proactive intervention of care; thus, preventing potential future costs.
  • With expertise in Natural Language Processing, and knowledge graphs, created a framework to process unstructured clinical notes, and transformed them to structured features for analysis and predictive modeling.
  • With thorough research, developed predictive models to track disease prognosis over time to stratify population based on risk of progressing to severe stages of a disease.
  • Collaborated with external subject matter experts and technical advisors to research and improve the predictive models performance

Confidential, Dayton, OH



  • Involved in research and development of predictive models that embed semantics of data and address challenges in classification of short and long unstructured data.
  • Implemented a knowledge discovery algorithm for materials domain to aid in advancing the research of new metals by extracting triples from materials domain literature.
  • Developed a personalization algorithm for social media that utilizes openly available knowledge bases to augment machine learning algorithms with additional features to estimate user’s interest in entities and content.
  • Implemented a recommendation algorithm for various domains utilizing hierarchical knowledge (taxonomies) to create user profiles and recommend content or entities of user’s interest. By augmenting machine learning algorithms with knowledge based features, we achieved competitive results.
  • Thesis: Research work focused on features of hierarchical knowledge bases that would influence the personalization and recommendation systems performance.
  • Implemented classification algorithms for healthcare data to automatically fill gaps in electronic medical records documentation accuracy. The resulting algorithm utilizes relationships between entities of healthcare domain and augments features extracted from unstructured text to classify documents accurately.


Research Intern


  • Responsible for research and development of solutions to automatically identify clinical documents that are inaccurate or incomplete in documenting the diagnosis.
  • Implemented a document classification algorithm to identify documents with implicit connections to diseases that otherwise go unnoticed in the healthcare workflow.
  • Involved in processing unstructured clinical notes, extracting features of different classes such as diseases, laboratory measures, medications etc. with attributes such as negations, temporality, and degree of certainty.
  • Developed classification models using decision tree, random forest, support vector machines etc. and evaluated/compared their performance in the scenario of incremental features as Confidential ’s Confidential days increase. This enabled identifying appropriate model which performs well right from the admission of a Confidential to the hospital.
  • Utilized hierarchical relationships from healthcare knowledge bases to identify relevant features to classify documents specific to diseases.
  • Applied feature ranking techniques based on information gain to reduce feature sets.


Test Engineer


  • Involved in planning and review of online gaming applications quality assurance phase.
  • Involved in preparing test scenarios, test cases to validate functional, integration and system level quality of the gaming applications.
  • Responsible for validating functionality across multiple channels such as web, mobile etc., various browsers and languages.
  • Automated the quality assurance tasks for several modules of the application by utilizing tools such as QTP.
  • Initiated and developed tools written in Java to automated some of the manual tasks that aid in testing phase of gaming applications. This resulted in saving approximately 15% of the effort in test execution phase.


Programmer Analyst


  • Involved in test planning, creating test scenarios, and writing test cases in the form of SQL queries to test ETL systems.
  • Responsible for logging defects in ALM, tracking and prioritizing unresolved defects for next releases.
  • Involved in peer reviews, and second level reviews of product’s quality assurance artifacts.
  • Involved in prioritization of deliverables, estimation of resources, and allocation as per agreed deliverable times.

