AI-Driven Affective Captioning for Equitable STEM Access Among Deaf and Hard-of-Hearing Students
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation investigates how Artificial Intelligebce (AI)- and Augmented Reality (AR)-supported captioning can improve communication access for Deaf and Hard-of-Hearing (DHH) learners in STEM contexts. Traditional real-time captions provide essential access to spoken language, but they often omit nonverbal and contextual information such as speaker identity, tone, emphasis, affect, and conversational intent. Across a preliminary design study and three empirical studies, this work examines how caption augmentations can preserve these missing layers of meaning while maintaining readability, timeliness, trust, and user control. The preliminary study compared traditional captions with emotion-augmented caption designs and showed that affective and visual cues can support comprehension when they are lightweight and text-centered, but may increase workload when they compete with the main transcript or visual scene. Study 1, a qualitative study with DHH participants, found that users valued emotion-aware captions when they clarified tone, emphasis, or speaker intent, but only when cues were timely, legible, optional, and subordinate to the transcript. Study 2 evaluated culturally adaptive emotive captioning in AR by comparing two cue formats: compact symbolic cues, implemented as emoji/icon indicators, and explicit textual affect labels, implemented as inline text-tags, across high- and low-context cultural cohorts. Compact symbolic cues produced a robust cross-cultural preference, while qualitative findings showed that participants valued the cues differently: some emphasized speed and reduced distraction, while others emphasized easier access to speaker emotion. Study 3 evaluated Speaker-Aware Affective Captioning, a multi-speaker captioning interface that combined speaker-attributed captions, confidence-gated affect tags, and an on-demand AI Describe feature. The study showed that speaker attribution was the most consistently valued support, while AI Describe helped users recover from missed or unclear information. Affect tags showed promise, but their usefulness depended on timing, persistence, interpretability, and trust. Across these studies, findings show that accessible captioning should not simply add more expressive information. Instead, next-generation captioning systems should reduce users' inferential burden through layered support: preserving the transcript first, identifying speakers, supporting recovery from missed information, and adding affective interpretation only when it is accurate, low-burden, and user-controllable. This dissertation contributes empirical evidence and design guidelines for trustworthy, culturally sensitive, and readable affective captioning systems for inclusive STEM learning.