Exploring the Design Space of Automatically Generated Emotive Captions for Deaf or Hard of Hearing Users

Abstract

Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.

Publication
In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
Saad Hassan
Saad Hassan
Assistant Professor

My research interests include human-computer interaction (HCI), accessibility, and computational social science.

Yao Ding
Yao Ding
Accessibility Researcher at Meta
Agneya Kerure
Agneya Kerure
Audio Experience Prototyper at Meta Reality Labs
Christi W. Miller
Christi W. Miller
Research Scientist at Meta Reality Labs
John Burnett
John Burnett
PhD Student at University of California at San Diego
Emily Biondo
Emily Biondo
Product Designer at Meta