Building an Ontology of Phenotypes by Mining the Biomedical Literature


Principal Investigator

Professor Russ B. Altman, MD, PhD

Stanford University, School of Bio-engineering

Oracle Fellowship Recipient

Bethany Percha

Oracle Principal Investigator

Bill Triebel
Chuck Weiss


The project entails researching and developing automated methods for creation of an ontology of phenotypes, observable characteristics such as diseases and drug-response side effects, by data mining the biomedical literature. After the first year of this project, we refined the definition of the target ontology to be a knowledge base of gene-phenotype and phenotype-phenotype relations. Oracle’s interest has also expanded to include other types of relations, especially drug-gene relations. Knowledge bases like this one will increasingly be vital tools for researchers and clinicians. Academic entities such as the Stanford PharmGKB project, and commercial entities, such as Ingenuity, are already curating such knowledge. The current approach relies on teams of highly trained scientists to sift through, extract, and classify information contained in the literature. The inherent limitations in this process for producing this information have limited more wide spread usage. The goal of our research is to enable further automation of the relation extraction process using natural language possessing techniques.