Here’s What I’ve Learned: Asking Questions that Reveal Reward Learning

Habibian, Soheil; Jonnavittula, Ananth; Losey, Dylan P.

Here’s What I’ve Learned: Asking Questions that Reveal Reward Learning

dc.contributor.author	Habibian, Soheil	en
dc.contributor.author	Jonnavittula, Ananth	en
dc.contributor.author	Losey, Dylan P.	en
dc.date.accessioned	2021-09-20T11:44:35Z	en
dc.date.available	2021-09-20T11:44:35Z	en
dc.date.issued	2021-07-02	en
dc.description.abstract	Robots can learn from humans by asking questions. In these questions the robot demonstrates a few different behaviors and asks the human for their favorite. But how should robots choose which questions to ask? Today’s robots optimize for informative questions that actively probe the human’s preferences as efficiently as possible. But while informative questions make sense from the robot’s perspective, human onlookers often find them arbitrary and misleading. For example, consider an assistive robot learning to put away the dishes. Based on your answers to previous questions this robot knows where it should stack each dish; however, the robot is unsure about right height to carry these dishes. A robot optimizing only for informative questions focuses purely on this height: it shows trajectories that carry the plates near or far from the table, regardless of whether or not they stack the dishes correctly. As a result, when we see this question, we mistakenly think that the robot is still confused about where to stack the dishes! In this paper we formalize active preference-based learning from the human’s perspective. We hypothesize that — from the human’s point-of-view — the robot’s questions reveal what the robot has and has not learned. Our insight enables robots to use questions to make their learning process transparent to the human operator.We develop and test a model that robots can leverage to relate the questions they ask to the information these questions reveal. We then introduce a trade-off between informative and revealing questions that considers both human and robot perspectives: a robot that optimizes for this trade-off actively gathers information from the human while simultaneously keeping the human up to date with what it has learned. We evaluate our approach across simulations, online surveys, and in-person user studies. We find that robots which consider the human’s point of view learn just as quickly as state-of-the-art baselines while also communicating what they have learned to the human operator. Videos of our user studies and results are available here: https://youtu.be/tC6y_jHN7Vw.	en
dc.description.notes	Preprint from arXiv.org.	en
dc.identifier.uri	http://hdl.handle.net/10919/105027	en
dc.identifier.url	https://arxiv.org/abs/2107.01995	en
dc.language.iso	en_US	en
dc.publisher	Virginia Tech	en
dc.rights	Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.subject	Computing methodologies	en
dc.subject	Active learning settings	en
dc.subject	Human-centered computing	en
dc.subject	Collaborative interaction	en
dc.subject	Human-robot interaction	en
dc.subject	reward learning	en
dc.subject	active learning	en
dc.subject	trust and interpretability	en
dc.title	Here’s What I’ve Learned: Asking Questions that Reveal Reward Learning	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2107.01995.pdf
Size:: 2.76 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Works, Mechanical Engineering
Scholarly Works, Center for Human-Computer Interaction (CHCI)