抄録
In most of current speech processing techniques, MFCC obtained from amplitude spectrum and delta-MFCC calculated as time derivative of MFCC are widely used as acoustic features. However, these features consider neither frequency derivative of amplitude spectrum nor phase information of speech waveform. Local feature and group delay spectrum are among the features claimed by previous works to possess such information useful for speech processing. We therefore examine their effectiveness on speech recognition performance. We conducted phoneme recognition experiments using speaker-dependent phoneme HMMs trained with local feature, group delay spectrum, and MFCC in same speaker, same gender, and different gender conditions. We obtained highest recognition rate by local feature, while the other features showed better performance for some phonemes. Likelihood combination of local feature, group delay spectrum, and MFCC HMMs yielded better phoneme recognition rate than the case in which each HMM was used solely. Results show that it is promising that recognition performance degradation can be alleviated by a combination of local feature, group delay spectrum, and MFCC.