An RBFN-based system for speaker-independent speech recognition
MetadataShow full item record
Several feature sets using mel-scale filter bank (MSFB), smoothed FFT, reflection coefficients (also called P ARCORs), and cepstral features are extracted. The MSFBs outperform the other features considered in our study.
Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are considered for phoneme recognition. RBFN's are easier to train than MLPs so that RBFN's were selected to perform phoneme classification.
Four RBFN's are compared: RBFN type-I is a single-layer RBFN, RBFN type-II is a two-layer net where the second layer consists of a vector of weights, RBFN type-III is a two-layer net where the second layer is a linear layer, and RBFN type-IV is a two-layer net where the second layer is a RBFN. RBFN type-II outperforms the others on the phone level where the phone recognition rate is about 44%.
Using clustering techniques, a suboptimal, iterative and interactive algorithm is developed to train the radial basis functions (RBFs). An algorithm is developed to reduce segmentation errors in TIMIT. The TIMIT 60 phone set is reduced to a 33 phone set by merging similar phones.
For 168 test speakers, 84% recognition rate is achieved on a vocabulary of 11 words from the sentence SAl ("she had your dark suit in greasy wash water all year") in TIMIT. For applications such as voice driven menu systems, the vocabulary words can be selected to be separable and distinct. A 95% recognition rate is achieved when the confusing words in the 11 words vocabulary are excluded to get an 8-word vocabulary.
Real-time implementation of the proposed system can be achieved using a digital signal processor that can perform a multiplication within lOOns.
- Doctoral Dissertations