To build a robust, effective environment for speech applications and to help
define platform-level requirements needed to support sophisticated speech
interaction.
Objective for FY94
To explore both underlying architectural issues and user-level interaction
techniques of speech applications, including the development of a set of tools to support others in the creation of speech applications.
Description
Speech technology can benefit users whose hands and/or eyes are busy, users
who suffer from certain physical disabilities, or users who are away from their
computer. Our initial focus has been on this last group of users. The prototype
speech applications we have developed are all aimed at allowing users who are
away from their desks, at home, or on the road to call up their Sun workstation
and verbally interact with several popular DeskSet applications.
To construct these speech applications, we have created a prototype speech
application framework called SpeechActs, in which multiple speech-driven applications can be integrated. This framework supports multiple
speech recognizers and synthesizers and includes a natural language
component and a unified grammar language.
Accomplishments
The most significant accomplishment this year was the construction of the prototype SpeechActs Framework. The Framework currently supports both the Hark(TM) recognizer from BBN and the Dagger(TM) recognizer from Texas Instruments. It also supports the TruVoice(TM) synthesizer from Centigram. To effectively integrate this speech technology, we added a blackboard to store shared data; an audio server to handle headphone, speaker box, and telephone access; a text-to-speech server to provide generic interfaces for both C and Lisp processes; and a pipe protocol to rationalize Lisp-to-C process communication.
We also created a Unified Grammar compiler and designed a corresponding
specification language as part of the SpeechActs Framework. The language is
used for specifying both speech recognizer and natural language grammars in a
recognizer-independent way. The language formalism is an augmented pattern-
matcher, using context-free, extended Backus-Naur form (BNF). The
augmentations include access to features specified in a lexicon and provide for
Pascal-like constraint specifications and result structure composition. The
Unified Grammar compiler guarantees that the speech recognition and natural
language grammars are synchronized. It compiles all constraints for the natural
language processor and most of the constraints for the speech recognizer.
To test out the Framework and Unified Grammar compiler, we created five
speech applications. A small login application, written in Lisp, allows users to telephone a Sun workstation, identify themselves, enter a password, and choose
an application. To simplify the process of logging in, we take advantage of the
telephone's caller-ID feature to try to pre-identify the user. Once logged in, the user may opt for the mail, calendar, weather or stock quotes' application. Mail and calendar are implemented as C wrappers to the Mail Tool and Calendar
Manager APIs so that users are able to access their current mail spool and
calendar data files. Once they are interacting with mail or calendar, users can
mark any information that has been read aloud and ask to have that marked
information faxed to them at a pre-determined location (home, work, etc.) or at
a number entered using the telephone keypad. Weather and stock quotes both
provide speech access to on-line data feeds. Users can check the National
Weather Service forecasts around the United States or can check the current
price of technology-related stocks.
Designing and experimenting with these applications has led us to identify a
number of important Speech User Interface (SUI) principles. The most
important of these involves basing dialog design on people's natural
conversational patterns rather than attempting to translate graphical user
interfaces directly into speech user interfaces. The challenge is to design a dialog that lets a user carry on a natural conversation without exceeding the restrictions of the system's lexicon or grammar.
References
Publications
"SpeechActs: A Framework for Building Speech Applications," N. Yankelovich,
E. Baatz, AVIOS `94 Conference Proceedings, San Jose, CA, September 20-23,
1994, SMLI 94-0243.
"SpeechActs: A Testbed for Continuous Speech Applications," P. Martin, A.
Kehler, Submitted to AAAI `94 Workshop on the Integration of Natural
Language and Speech Processing, Seattle, WA, August 1-2, 1994, SMLI 94-0032.
"Talking vs. Taking: Speech Access to Remote Computers," N. Yankelovich,
CHI '94 Adjunct Proceedings, 1994 ACM Conference on Human Factors in
Computing Systems, Boston MA, April 24-28, 1994, SMLI 94-0013.
"SpeechActs and the Design of Speech Interfaces," N. Yankelovich, CHI'94
Workshop on The Future of Speech and Audio in the Interface, 1994 ACM
Conference on Human Factors in Computing Systems, Boston MA, April 24-28,
1994, SMLI 94-0046.