Using Word Confidence Measure for OOV Words Detection in a Sp...

EUROSPEECH 2003 - GENEVA
Using Word Confidence Measure for OOV Words Detection in a Spontaneous Spoken Dialog System
Hui SUN, Guoliang ZHANG, Fang ZHENG*, and Mingxing XU Center of Speech Technology, State Key Laboratory of Intelligent Technology and System Tsinghua University, Beijing, 100084, China
[sunh, liang, fzheng, xumx]@sp.cs.tsinghua.edu.cn
* Beijing d-Ear Technologies Co., Ltd., fzheng@d-Ear.com, http://www.d-Ear.com
Abstract
Developing a real-life spoken dialogue system must face with many practical issues, where the out-of-vocabulary (OOV) words problem is one of the key difficulties. This paper presents the OOV detection mechanism based on the word confidence scoring developed for the d-Ear Attendant system, a spontaneous spoken dialogue system. In the d-Ear Attendant system, an explicit filler model is originally used to detect the presence of OOV words [1]. Although this approach has a satisfactory OOV detection rate, it badly degrades the accuracy of in-vocabulary (IV) detection by 4.4% absolutely (from 97% to 92.6%). Such the degradation will not be acceptable in a practical system. By using a few commonly used acoustic confidence features and some new context confidence features, our confidence measure method not only is able to detect the word level speech recognition errors, but also has a good ability for OOV words detection with an acceptable false alarm rate. For example, with a false rejection rate of 2.5%, the false acceptance rate of 26% is achieved. A popular method for OOV words detection is to incorporate some forms of filler or garbage models to absorb such OOV words [2, 3]. In our previous work, we used a type of online filler model to detect the OOV words, achieving an OOV words detection rate of 76.5%. But this filler model degrades the accuracy of in-vocabulary (IV) data greatly. As we have mentioned above, the accuracy is so important in this task that such a degradation caused by the OOV filler model is not acceptable. The confidence measure is useful in most applications of speech recognition, which is mainly for recognition errors rejection. Since most OOV words in the sentence are reflected by recognition errors, the confidence measure can be used to detect them. In the d-Ear Attendant system, two levels of confidence features, acoustic features and context features, are computed and combined to decide whether a word should be rejected or not. Especially, the context features proposed in this paper are shown important for confidence scoring. For the in-grammar test set, we achieve an OOV words rejection rate of 76.5% at a false rejection rate of 2.8%, while reducing the in-vocabulary detection accuracy only from 97% to 94.3%. In the remaining part of this paper we first briefly introduce the d-Ear Attendant system and its filler model for OOV words detection. Then in section 3, the features used for confidence measure, including the acoustic features and context features, as well as the method for combining all the confidence features, are presented. A series of experiments and results are given in section 4. Finally, we conclude the paper and present several directions for the future work.

下一页

文档基本属性
文档语言：	Simplified Chinese
文档格式：	pdf
文档作者：	微软用户
关键词：
主题：
备注：
点击这里显示更多文档属性
经理：
单位：	微软中国
分类：
创建时间：
上次保存者：
修订次数：
编辑时间：
文档创建者：
修订：
加密标识：
幻灯片：
段落数：
字节数：
备注：
演示格式：
上次保存时间：