Exploiting Speech Understanding in Intelligent Interfaces

Minoru TERADA

Offer Organization: Japan Society for the Promotion of Science, System Name: Grants-in-Aid for Scientific Research, Category: Grant-in-Aid for international Scientific Research, Fund Type: -, Overall Grant Amount: - (direct: 4200000, indirect: -) We are interested in the use of spoken language in human-computer interaction. The inspiration is the fact that, for human-human interaction, meaningful exchanges can take place even without accurate recognition of the words the other is saying --- this being possible due to shared knowledge and complementary communication channels, especially gesture and prosody. We want to exploit this fact for man-machine interfaces. Therefore we are doing three things : 1. Using simple speech recognition to augment graphical user interfaces, well integrated with other input modalities : keyboard, mouse, and touch screen. 2. Building systems able to engage in simple conversations, using mostly prosodic clues. To sketch out our latest success : We conjectured that it would be possible for Japanese to decide when to produce many back-channel utterances based on prosodic clues alone, without reference to meaning. We found that neither vowel lengthening, volume changes, nor energy level (to detect when the other finished speaking) were by themselves good predictors of when to produce an aizuchi. The best predictor was a low pitch level. Specifically, upon detection of the end of a region of pitch less than.9 times the local median pitch and continuing for 150ms, coming after at least 600ms of speech, the system predicted an aizuchi 200ms to 300ms later, providing it had not done so within the preceding 1 second. We also built a real-time system based on the above decision rule. A human stooge steered the conversation to a suitable topic and then switched on the system. After swich-on the stooge's utterances and the system's outputs, mixed together, produced one side of the conversation. We found that none of the 5 subjects had realized that his conversation partner had become partially automated. 3. Building tools and collecting data to help do 1 and 2.

Exploiting Speech Understanding in Intelligent Interfaces

抄録

ファイルとリンク (1)

メトリック

詳細