<TITLE>Sun Labs: Conceptual Indexing/Retrieval Technology "> Sun Labs: Conceptual Indexing/Retrieval Technology
United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...

Sun Labs: Conceptual Indexing/Retrieval Technology

Key Ideas
behind the
Technology

Introduction Key Ideas Examples
Benefits Papers People

Making a difference

We have found that techniques from knowledge representation and natural language processing can make a useful contribution to solving the paraphrase problem. By searching a structured conceptual taxonomy of the words and phrases extracted from a collection of documents, our algorithms can effectively connect terms in a query with appropriate related terms in document passages.

The problem with synonyms

A common approach to the paraphrase problem is to use tables of synonyms to automatically expand queries by adding terms that are recorded as "synonymous." However, there are few real synonyms in English, so the common practice is to include related words as if they were synonyms. However, treating terms this way when they are not really synonyms introduces a level of granularity that trades off precision for recall. There is no a priori correct level for this tradeoff - different information needs require different levels of generality - so this technique often degrades retrieval rather than improving it.

As an alternative to synonym classes, we use taxonomic subsumption algorithms that exploit generality (subsumption) rather than synonymy to connect terms in queries with passages that contain more specific terms as well as the requested terms. These algorithms do not automatically explore more general terms, so the level of generality is controlled by your choice of query terms. For example, if you ask for "motor vehicles" you would get trucks, buses, cars, etc., but if you ask for "automobiles" you would get cars and taxicabs, but not trucks and buses.

Taxonomies

Using knowledge bases of general semantic facts, structured conceptual taxonomies (a type of semantic network) can be constructed from words and phrases. These words and phrases can be extracted automatically from text and parsed into conceptual structures. The taxonomy can be organized by the most-specific-subsumer (MSS) relationship, where each concept is linked to the most specific concepts that subsume it - i.e., that are more general than it is. Terms in a query are individually matched with corresponding concepts in the taxonomy together with their subconcepts.

For example, given the general semantic facts that "washing" is a kind of "cleaning" and "car" is a kind of "automobile", an algorithmic classification system can automatically classify "car washing" as a kind of "automobile cleaning". A query for "automobile cleaning" or "automobile washing" will immediately retrieve hits for "car washing".


Knowledge Technology Group, Sun Microsystems Laboratories