The Natural Statistics of Audiovisual Speech

Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillitano, Sébastien; Caplier, Alice; Ghazanfar, Asif A.

The Natural Statistics of Audiovisual Speech

dc.contributor.author	Chandrasekaran, Chandramouli	en
dc.contributor.author	Trubanova, Andrea	en
dc.contributor.author	Stillitano, Sébastien	en
dc.contributor.author	Caplier, Alice	en
dc.contributor.author	Ghazanfar, Asif A.	en
dc.contributor.department	Psychology	en
dc.date.accessioned	2019-05-15T18:17:20Z	en
dc.date.available	2019-05-15T18:17:20Z	en
dc.date.issued	2009-07-17	en
dc.description.abstract	Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it’s been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.	en
dc.description.sponsorship	This work was supported by the National Institutes of Health (NINDS) R01NS054898 (AAG), the National Science Foundation BCS-0547760 CAREER Award (AAG), and Princeton Neuroscience Institute Quantitative and Computational Neuroscience training grant NIH R90 DA023419-02 (CC). The Wisconsin x-ray facility is supported in part by NIH NIDCD R01 DC00820 (John Westbury and Carl Johnson).	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1371/journal.pcbi.1000436	en
dc.identifier.issue	7	en
dc.identifier.uri	http://hdl.handle.net/10919/89536	en
dc.identifier.volume	5	en
dc.language.iso	en_US	en
dc.publisher	PLOS	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.title	The Natural Statistics of Audiovisual Speech	en
dc.title.serial	PLoS Computational Biology	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: journal.pcbi.1000436.PDF
Size:: 1.91 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Works, Psychology