An RBFN-based system for speaker-independent speech recognition

Huliehel, Fakhralden A.

An RBFN-based system for speaker-independent speech recognition

dc.contributor.author	Huliehel, Fakhralden A.	en
dc.contributor.committeechair	VanLandingham, Hugh F.	en
dc.contributor.committeemember	Abbott, A. Lynn	en
dc.contributor.committeemember	Bay, John S.	en
dc.contributor.committeemember	Beex, A. A. Louis	en
dc.contributor.committeemember	Palettas, Panickos N.	en
dc.contributor.department	Electrical Engineering	en
dc.date.accessioned	2014-03-14T21:13:15Z	en
dc.date.adate	2008-06-06	en
dc.date.available	2014-03-14T21:13:15Z	en
dc.date.issued	1995-07-17	en
dc.date.rdate	2008-06-06	en
dc.date.sdate	2008-06-06	en
dc.description.abstract	A speaker-independent isolated-word small vocabulary system is developed for applications such as voice-driven menu systems. The design of a cascade of recognition layers is presented. Several feature sets are compared. Phone recognition is performed using a radial basis function network (RBFN). Dynamic time warping (DTW) is used for word recognition. The TIMIT database is used to design and test the automatic speech recognition (ASR) system. Several feature sets using mel-scale filter bank (MSFB), smoothed FFT, reflection coefficients (also called P ARCORs), and cepstral features are extracted. The MSFBs outperform the other features considered in our study. Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are considered for phoneme recognition. RBFN's are easier to train than MLPs so that RBFN's were selected to perform phoneme classification. Four RBFN's are compared: RBFN type-I is a single-layer RBFN, RBFN type-II is a two-layer net where the second layer consists of a vector of weights, RBFN type-III is a two-layer net where the second layer is a linear layer, and RBFN type-IV is a two-layer net where the second layer is a RBFN. RBFN type-II outperforms the others on the phone level where the phone recognition rate is about 44%. Using clustering techniques, a suboptimal, iterative and interactive algorithm is developed to train the radial basis functions (RBFs). An algorithm is developed to reduce segmentation errors in TIMIT. The TIMIT 60 phone set is reduced to a 33 phone set by merging similar phones. For 168 test speakers, 84% recognition rate is achieved on a vocabulary of 11 words from the sentence SAl ("she had your dark suit in greasy wash water all year") in TIMIT. For applications such as voice driven menu systems, the vocabulary words can be selected to be separable and distinct. A 95% recognition rate is achieved when the confusing words in the 11 words vocabulary are excluded to get an 8-word vocabulary. Real-time implementation of the proposed system can be achieved using a digital signal processor that can perform a multiplication within lOOns.	en
dc.description.degree	Ph. D.	en
dc.format.extent	x, 169 leaves	en
dc.format.medium	BTD	en
dc.format.mimetype	application/pdf	en
dc.identifier.other	etd-06062008-162619	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-06062008-162619/	en
dc.identifier.uri	http://hdl.handle.net/10919/38196	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	LD5655.V856_1995.H855.pdf	en
dc.relation.isformatof	OCLC# 33878112	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	voice-driven menu systems	en
dc.subject.lcc	LD5655.V856 1995.H855	en
dc.title	An RBFN-based system for speaker-independent speech recognition	en
dc.type	Dissertation	en
dc.type.dcmitype	Text	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LD5655.V856_1995.H855.pdf
Size:: 7.13 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Doctoral Dissertations