Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

dc.contributor.authorBıyık, Erdemen
dc.contributor.authorLosey, Dylan P.en
dc.contributor.authorPalan, Malayandien
dc.contributor.authorLandolfi, Nicholas C.en
dc.contributor.authorShevchuk, Gleben
dc.contributor.authorSadigh, Dorsaen
dc.date.accessioned2022-02-11T21:30:42Zen
dc.date.available2022-02-11T21:30:42Zen
dc.date.issued2022-01en
dc.date.updated2022-02-11T21:30:40Zen
dc.description.abstractReward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.en
dc.description.versionAccepted versionen
dc.format.extentPages 45-67en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1177/02783649211041652en
dc.identifier.eissn1741-3176en
dc.identifier.issn0278-3649en
dc.identifier.issue1en
dc.identifier.urihttp://hdl.handle.net/10919/108319en
dc.identifier.volume41en
dc.language.isoenen
dc.publisherSAGEen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subject0801 Artificial Intelligence and Image Processingen
dc.subject0906 Electrical and Electronic Engineeringen
dc.subject0913 Mechanical Engineeringen
dc.subjectIndustrial Engineering & Automationen
dc.titleLearning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferencesen
dc.title.serialInternational Journal of Robotics Researchen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Mechanical Engineeringen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
biyik_ijrr2021.pdf
Size:
5.12 MB
Format:
Adobe Portable Document Format
Description:
Accepted version