An RBFN-based system for speaker-independent speech recognition

dc.contributor.authorHuliehel, Fakhralden A.en
dc.contributor.committeechairVanLandingham, Hugh F.en
dc.contributor.committeememberAbbott, A. Lynnen
dc.contributor.committeememberBay, John S.en
dc.contributor.committeememberBeex, A. A. Louisen
dc.contributor.committeememberPalettas, Panickos N.en
dc.contributor.departmentElectrical Engineeringen
dc.date.accessioned2014-03-14T21:13:15Zen
dc.date.adate2008-06-06en
dc.date.available2014-03-14T21:13:15Zen
dc.date.issued1995-07-17en
dc.date.rdate2008-06-06en
dc.date.sdate2008-06-06en
dc.description.abstractA speaker-independent isolated-word small vocabulary system is developed for applications such as voice-driven menu systems. The design of a cascade of recognition layers is presented. Several feature sets are compared. Phone recognition is performed using a radial basis function network (RBFN). Dynamic time warping (DTW) is used for word recognition. The TIMIT database is used to design and test the automatic speech recognition (ASR) system. Several feature sets using mel-scale filter bank (MSFB), smoothed FFT, reflection coefficients (also called P ARCORs), and cepstral features are extracted. The MSFBs outperform the other features considered in our study. Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are considered for phoneme recognition. RBFN's are easier to train than MLPs so that RBFN's were selected to perform phoneme classification. Four RBFN's are compared: RBFN type-I is a single-layer RBFN, RBFN type-II is a two-layer net where the second layer consists of a vector of weights, RBFN type-III is a two-layer net where the second layer is a linear layer, and RBFN type-IV is a two-layer net where the second layer is a RBFN. RBFN type-II outperforms the others on the phone level where the phone recognition rate is about 44%. Using clustering techniques, a suboptimal, iterative and interactive algorithm is developed to train the radial basis functions (RBFs). An algorithm is developed to reduce segmentation errors in TIMIT. The TIMIT 60 phone set is reduced to a 33 phone set by merging similar phones. For 168 test speakers, 84% recognition rate is achieved on a vocabulary of 11 words from the sentence SAl ("she had your dark suit in greasy wash water all year") in TIMIT. For applications such as voice driven menu systems, the vocabulary words can be selected to be separable and distinct. A 95% recognition rate is achieved when the confusing words in the 11 words vocabulary are excluded to get an 8-word vocabulary. Real-time implementation of the proposed system can be achieved using a digital signal processor that can perform a multiplication within lOOns.en
dc.description.degreePh. D.en
dc.format.extentx, 169 leavesen
dc.format.mediumBTDen
dc.format.mimetypeapplication/pdfen
dc.identifier.otheretd-06062008-162619en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-06062008-162619/en
dc.identifier.urihttp://hdl.handle.net/10919/38196en
dc.language.isoenen
dc.publisherVirginia Techen
dc.relation.haspartLD5655.V856_1995.H855.pdfen
dc.relation.isformatofOCLC# 33878112en
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectvoice-driven menu systemsen
dc.subject.lccLD5655.V856 1995.H855en
dc.titleAn RBFN-based system for speaker-independent speech recognitionen
dc.typeDissertationen
dc.type.dcmitypeTexten
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LD5655.V856_1995.H855.pdf
Size:
7.13 MB
Format:
Adobe Portable Document Format
Description: