TAGGAR: General-Purpose Task Guidance from Natural Language in Augmented Reality using Vision-Language Models

dc.contributor.authorStover, Danielen
dc.contributor.authorBowman, Douglas A.en
dc.date.accessioned2024-11-04T14:13:37Zen
dc.date.available2024-11-04T14:13:37Zen
dc.date.issued2024-10-07en
dc.date.updated2024-11-01T07:56:44Zen
dc.description.abstractAugmented reality (AR) task guidance systems provide assistance for procedural tasks by rendering virtual guidance visuals within the real-world environment. Current AR task guidance systems are limited in that they require AR system experts to manually place visuals, require models of real-world objects, or only function for limited tasks or environments. We propose a general-purpose AR task guidance approach for tasks defined by natural language. Our approach allows an operator to take pictures of relevant objects and write task instructions for an end user, which are used by the system to determine where to place guidance visuals. Then, an end user can receive and follow guidance even if objects change locations or environments. Our approach utilizes current visionlanguage machine learning models for text and image semantic understanding and object localization. We built a proof-of-concept system called TAGGAR using our approach and tested its accuracy and usability in a user study. We found that all operators were able to generate clear guidance for tasks and end users were able to follow the guidance visuals to complete the expected action 85.7% of the time without any knowledge of the tasks.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3677386.3682095en
dc.identifier.urihttps://hdl.handle.net/10919/121532en
dc.language.isoenen
dc.publisherACMen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderThe author(s)en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleTAGGAR: General-Purpose Task Guidance from Natural Language in Augmented Reality using Vision-Language Modelsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3677386.3682095.pdf
Size:
15.91 MB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: