Ubur, SundayAdewale, SikiruChandrashekar, NikithaAkli, EnochGracanin, Denis2025-11-042025-11-042025-10-26https://hdl.handle.net/10919/138850Deaf and Hard-of-Hearing (DHH) individuals increasingly rely on real-time captioning to access spoken content in educational and professional settings. However, traditional captions omit vocal emotional cues, such as intonation and affect which can hinder comprehension and engagement. This work introduces Interpretive Caption, a machine-learning prototype that augments captions with emotion-aware annotations derived from vocal tone. Using letter-coded tags with hover-based tooltips, the system conveys emotional context on demand, balancing clarity with cognitive accessibility. We conducted a qualitative study with eight DHH participants who interacted with the prototype and shared feedback on usability, emotional clarity, and layout design. Findings highlight the value of hover-based emotional cues, customization features, and segmentation aligned with cognitive load principles. Participants appreciated the non-intrusive emotional insights, while also identifying areas for improvement, including accent-inclusive emotion recognition and better mobile accessibility. Our contributions include a real-time captioning prototype integrating speech emotion recognition, a user-controllable emotion display interface, and design insights for affective accessibility in educational contexts. This work offers a foundation for inclusive, expressive captioning and informs future multimodal caption systems that prioritize interpretability, cultural sensitivity, and user agency.application/pdfenIn Copyright (InC)Interpretive Caption: Real-Time Vocal Emotion Cues for DHH UsersArticle - Refereed2025-11-01The author(s)https://doi.org/10.1145/3663547.3759697