Feed Me: an in-situ Augmented Reality Annotation Tool for Computer Vision
Ilo, Cedrick K.
MetadataShow full item record
The power of today's technology has enabled the combination of Computer Vision (CV) and Augmented Reality (AR) to allow users to interface with digital artifacts between indoor and outdoor activities. For example, AR systems can feed images of the local environment to a trained neural network for object detection. However, sometimes these algorithms can misclassify an object. In these cases, users want to correct the model's misclassification by adding labels to unrecognized objects, or re-classifying recognized objects. Depending on the number of corrections, an in-situ annotation may be a tedious activity for the user. This research will focus on how in-situ AR annotation can aid CV classification and what combination of voice and gesture techniques are efficient and usable for this task.
General Audience Abstract
The power of today’s technology has allowed the ability of new inventions such as computer vision and Augmented Reality to work together seamlessly. The reason why computer scientists rave so much about computer vision is that it can enable a computer to see the world as humans do. With the rising popularity of Niantic’s Pokemon Go, Augmented Reality has become a new research area that researchers around the globe have taken part in to make it more stable and as useful as its next of kin virtual reality. For example, Augmented Reality can support users in gaining a better understanding of their environment by overlaying digital content into their field of view. Combining Computer Vision with Augmented Reality could aid the user further by detecting, registering, and tracking objects in the environment. However, sometimes a Computer Vision algorithm can falsely detect an object in a scene. In such cases, we wish to use Augmented Reality as a medium to update the Computer Vision’s object detection algorithm in-situ, meaning in place. With this idea, a user will be able to annotate all the objects within the camera’s view that were not detected by the object detection model and update any in-accurate classification of the objects. This research will primarily focus on visual feedback for in-situ annotation and the user experience of the Feed Me voice and gesture interface.
- Masters Theses