The Effects of a Humanoid Robot's Non-lexical Vocalization on Emotion Recognition and Robot Perception
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
As robots have become more pervasive in our everyday life, social aspects of robots have attracted researchers' attention. Because emotions play a key role in social interactions, research has been conducted on conveying emotions via speech, whereas little research has focused on the effects of non-speech sounds on users' robot perception. We conducted a within-subjects exploratory study with 40 young adults to investigate the effects of non-speech sounds (regular voice, characterized voice, musical sound, and no sound) and basic emotions (anger, fear, happiness, sadness, and surprise) on user perception. While listening to the fairytale with the participant, a humanoid robot (Pepper) responded to the story with a recorded emotional sound with a gesture. Participants showed significantly higher emotion recognition accuracy from the regular voice than from other sounds. The confusion matrix showed that happiness and sadness had the highest emotion recognition accuracy, which aligns with the previous research. Regular voice also induced higher trust, naturalness, and preference compared to other sounds. Interestingly, musical sound mostly showed lower perceptions than no sound. A further exploratory study was conducted with an additional 49 young people to investigate the effect of regular non-verbal voices (female voices and male voices) and basic emotions (happiness, sadness, anger, and relief) on user perception. We also further explored the impact of participants' gender on emotion and social perception toward robot Pepper. While listening to a fairy tale with the participants, a humanoid robot (Pepper) responded to the story with gestures and emotional voices. Participants showed significantly higher emotion recognition accuracy and social perception from the voice + Gesture condition than Gesture only conditions. The confusion matrix showed that happiness and sadness had the highest emotion recognition accuracy, which aligns with the previous research. Interestingly, participants felt more discomfort and anthropomorphism in male voices compared to female voices. Male participants were more likely to feel uncomfortable when interacting with Pepper. In contrast, female participants were more likely to feel warm. However, the gender of the robot voice or the gender of the participant did not affect the accuracy of emotion recognition. Results are discussed with social robot design guidelines for emotional cues and future research directions.