To improve people's ability to find information in online text material. Our approach is to use semantic relationships among concepts, and to use natural language processing and knowledge representation techniques to deal with the differences in terminology between the way queries are expressed and the way that desired information is worded.
Objective for FY95
To develop a prototype system that can be deployed for pilot testing with people outside the Conceptual Indexing group, and begin evaluating this technology with real users.
Description
The Conceptual Indexing project has been developing techniques for indexing and organizing information in structured conceptual taxonomies that will facilitate browsing and retrieval of specific information in response to specific information needs. These taxonomies are structured networks of concepts and relationships among concepts that can be used to relate the terminology of a request to terminology used in information that may satisfy the request. The goal is to understand the use of such structures in applications such as online documentation, online information access, hypertext publishing, catalog indexes, and information access over networks.
Conceptual Indexing refers to a technology for organizing facts, ideas, words, phrases, and descriptions into a structured taxonomy that can be used as an organizing structure for information retrieval, and as a structure to support human browsing. We conjecture that these structures can also be used to help a person understand and organize complex bodies of information. We have implemented a conceptual indexer that will extract words and phrases from text files and organize them into a taxonomy that can be browsed and used to access information. Such a taxonomy can index text material at the level of individual sentences and phrases to support locating specific answers to specific questions.
The conceptual indexer is used as a component for a prototype indexing and retrieval system. The components of the conceptual indexer and retriever include:
A parser that analyzes phrases extracted from text to determine their conceptual structure for incorporation into the index;
A core dictionary of words that is used by the parser to determine the structure of phrases;
A morphological analyzer for analyzing words that are inflected or derived forms of words that are in the dictionary, and for guessing grammatical roles and semantic relationships for unknown words;
A knowledge base of semantic relationships among words and concepts that is used to judge the relationships between complex concepts and between query terms and terms in a text;
A conceptual classifier that takes conceptual descriptions and assimilates them into a taxonomy in such a way that they are directly linked to the most specific concepts that subsume them;
A browser for viewing and navigating within a conceptual taxonomy;
A retrieval engine that uses paths in the conceptual taxonomy to make connections between terms in a query and terms in the text to dynamically identify good passages to retrieve in response to a query; and
A viewer used to view the results of a search and to view retrieved passages in the context of their source documents.
Accomplishments
We now have an operational prototype system that responds to queries over the network using a Mosaic interface, and we have begun to experimentally evaluate this system. This system has been demonstrated to a number of groups within Sun as well as to the Labs' Advisory Committee, and it has been very well received.
This year has seen a surge of activity both in creating a usable prototype and in discussions with potential clients within Sun. Our goal for next year is to develop a version of this technology that will be self-contained, so that potential clients can deploy it in their own environments and can incorporate parts of it into applications. The current prototype consists of a server running on the East Coast to do the indexing and retrievals, while the client uses the Mosaic interface to invoke retrievals and browse the conceptual taxonomy. This is an effective way to let people explore the system before a more portable version is available, and also a way for us to gather information about how the system is used. Work on a more portable prototype has begun this year and the results should be available during the next fiscal year.
References
Presentations
Woods, W. A. "Beyond Ignorance-Based Systems." Slides for Talk at University of Toronto (March 1995). SML-95-0093.
Publications
Resnick, P. "Using Information Content to Evaluate Semantic Similarity in a Taxonomy." IJCAI-95 (March 1995). SML-95-0104.