Anatomy Lens
A search engine that helps scientists hone in on PubMed articles most relevant to their research.
Date Posted: April 10, 2008
|
|
 |
 |
 |
 |
|
 |  In searching biomedical data, using the concepts in an ontology can help improve recall. For instance, if
users want to search for medical articles that mention the heart, they implicitly mean articles that either mention the heart or any of its subparts. An ontology like the Foundational Model of Anatomy (FMA) formally defines part hierarchies that can be used to automatically improve recall of medical data. Similarly, when users want to find genes that are involved in a certain biological process such as neuron development, they implicitly mean any subprocesses of neuron development (such as axon or dendrite development). Ontologies (such as the Gene Ontology (GO)), which formally describe these subprocesses, can be used to improve recall of relevant genes or articles. Anatomy Lens provides a concept-based search of the medical literature using ontologies such as GO, FMA, and MeSH (which is not formally an ontology but which organizes the literature by subject headings) | | |
 |  The data set used in this service is based in spirit on the Banff HCLS Demo. The focus in our service is to show the value of ontology-based querying of medical data. Our demo currently contains the
following integrated datasets:
- PubMed articles from the 2008 distribution with only the titles and their links to MeSH
- Gene annotations GOA, linking genes and gene products to articles, specific evidence codes, and gene ontology processes (such as dendrite development) defined in the Gene Ontology.
- The Gene Ontology, which is a definition of the biological, cellular, and molecular functions of genes
- The Foundational Model of Anatomy, which is a definition of anatomical parts and their subparts. We used the OWL version of FMA created by Golbreich, Zhang, Bodenreider (2006).
| | |
 |  The GOA annotations link PubMed articles to GO concepts. Similarly, the PubMed data stores mappings to MeSH subject headings. In order to connect articles to FMA concepts, we mapped MeSH anatomical terms to FMA based on UMLS. This missed some key FMA-to-MeSH mappings (such as hippocampus). We used MMTx-from-NLM to augment the mappings, picking mappings with a perfect score. | | |
 |  Clearly, the user now gets more relevant matches to queries (for example, searching for heart will also return articles about the ventricle). The user will also get more focused results because we provide a search only for articles that are annotated by MeSH headings or by users for these specific concepts (such as in the case of GOA). We do not provide keyword-based matches, so this will automatically exclude articles that might be irrelevant to the query. As an example, if a user wants articles about the GO process of respiratory tube development but enters it in a keyword search, he would get articles about children with respiratory infections needing tubes to prevent middle ear infections in development. And he will miss articles about lung development, which is part of the respiratory tube development process. | | |
 |  Expanding user searches to subparts is sometimes problematic if the ontology was not designed with a specific use case (such as searching for medical articles) in mind. For instance, in FMA, anatomical part hierarchies are modeled from the level of an organ down to the cellular level. It is not very useful if the user searches for lung and finds articles about the cell. Of course, lungs are composed of cells, but so are all other organs. To solve this problem, we return queries ranked by the length of the chain of reasoning it took to derive that a part (such as a cell) is a subpart of the queried part (such as a lung). As the semantic distance grows between concepts in an ontology, the likelihood that they are relevant decreases. We use this to rank our returned results. We also group results by matching search terms, so users can easily discard results that they deem irrelevant. | | |
 |  OWL reasoning is known to be intractable in the worst case. Certain combinations of constructors are particularly problematic for OWL reasoning. An example of this may be seen in reasoning on FMA. FMA is a deep partonomy, with both has_part and part_of relations and an inverse relation between part_of and has_part. It is well known that when modeling a partonomy, modeling both has_part and part_of relations causes reasoners to fail (see "Best Practices on modeling part-whole relations"). We therefore included only part-of relations in FMA, as recommended in the best practices.
The versions of GO and FMA that we used fall into an OWL dialect called EL+, which is known to have polynomial complexity for classification. Our data set, however, does contain negation (for instance, some gene records are known to be not associated with a GO Process), and this negation causes reasoning to fall outside the scope of EL+. We have built a reasoner called SHER (Scalable Highly Expressive Reasoner) to reason over large OWL ontologies and large instance data sets; SHER can handle all the constructs in OWL-DL ontologies except for nominals. Although SHER has been used to reason over 60 million RDF triples in 10-20 minutes, this is still too long for a service that needs to operate in Web time. Anatomy Lens has 305 million triples from PubMed data, FMA consists of 75,000 concepts, GO consists of 30,000 concepts, and MeSH consists of 15,000 terms. Anatomy Lens therefores uses a special-purpose reasoner for scalability. The techniques used in the Anatomy Lens reasoner have been generalized and incorporated into SHER.
Another challenge in using ontology-based querying is that it is often hard for a user to validate the returned results. Anatomy Lens provides the user with an explanation for the result, that is, the specific facts in the ontology and data that caused the reasoner to infer the particular results. | | |
 |  Anatomy Lens converts multiple search keywords entered by the user (MeSH terms, FMA concepts, GO concepts) into a single AND query. For instance, if the user wants to find all articles that are linked to Alzheimer's disease (a MeSH term), hippocampus (an FMA concept), and dendrite development (a GO concept), the system issues a conjunctive query looking for articles that are linked to all three terms (these terms will be expanded into the queries for subparts and subprocesses appropriately).
Note that a user can choose to specify multiple terms within a particular terminology (that is, MeSH, FMA, or GO) as well. By default, the system generates an OR query between terms from the same terminology. The user can choose to switch to an AND query between these terms by selecting the corresponding option from the drop-down menu (▽, located at the top-left corner of the query widget). | | |
 |  There are two reasons for this. First, by default, PubMed includes a search by concept-based headings as
well as a keyword-based search. As pointed out earlier, a keyword-based search will produce more results because an article might match a subset of words that define a concept (such as tube in respiratory tube development). Second, even if one restricts PubMed search to MeSH terms, PubMed may produce more results because its databases are more up-to-date (Anatomy Lens has the 2008 Medline distribution). | | |
 |  MeSH is formally not a taxonomy, but it is organized as separate trees. For example, if you consider the trees for the term Program Evaluation in the MESH browser, you will notice that Program Evaluation appears in three trees, and only in two trees does it contain the subclass Benchmarking. To be consistent in our expansion of MeSH concepts, we treat these separate trees as a directed acyclic graph and assume that Program Evaluation always needs to be expanded to Benchmarking. | |
|
|
 |
|
| |