Sonification of the Scene in the Image Environment and Metaverse Using Natural Language

TR Number

Date

2023-01-17

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

This metaverse and computer vision-powered application is designed to serve people with low vision or a visual impairment, ranging from adults to old age. Specifically, we hope to improve the situational awareness of users in a scene by narrating the visual content from their point of view. The user would be able to understand the information through auditory channels as the system would narrate the scene's description using speech technology. This could increase the accessibility of visual-spatial information for the users in a metaverse and later in the physical world. This solution is designed and developed considering the hypothesis that if we enable the narration of a scene's visual content, we can increase the understanding and access to that scene. This study paves the way for VR technology to be used as a training and exploration tool not limited to blind people in generic environments, but applicable to specific domains such as military, healthcare, or architecture and planning. We have run a user study and evaluated our hypothesis about which set of algorithms will perform better for a specific category of tasks - like search or survey - and evaluated the narration algorithms by the user's ratings of naturalness, correctness and satisfaction. The tasks and algorithms have been discussed in detail in the chapters of this thesis.

Description

Keywords

Metaverse, Machine Learning, Computer Vision, X3DOM

Citation

Collections