The use of the auditory lexical decision task as a method for assessing the relative quality of synthetic speech

TR Number
Date
1992-05-05
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

This study evaluates a method for determining the quality of synthetic speech systems. The method involves the use of an auditory lexical decision task to assess the quality of synthetic speech generators relative to each other and to natural speech by using reaction time differences and error rates. Seven voices were evaluated; four synthesizers provided six voices (DECtalk 1.8 Perfect Paul, DECtaik 1.8 Beautiful Betty, DECtaik 2.0 Perfect Paul, DEC talk 2.0 Beautiful Betty, Votrax Personal Speech, Votrax Type'n'Talk) and natural speech provided the seventh voice. Both reaction times and error rates were higher for the low quality synthetic speech systems. The results document that the DECtalk can currently be considered a high quality synthesizer and that the Personal Speech and the Type'n'Talk can be considered low quality synthesizers. The results obtained by using this method can be explained by use of the Activation-Verification model (Paap, McDonald, Schvaneveldt, and Noel, 1986). Within the framework of this model, the results of this study suggest that the verification phase is the bottle-neck in processing words produced by synthetic speech generators. This interpretation suggests that by emphasizing the differences between different phonemes, to make them more uniquely identifiable, rather than concentrating on making them more "natural" might lead to improved results with synthesized speech.

Description
Keywords
Citation
Collections