On Kernelized Multi-Armed Bandits with Constraints

dc.contributor.authorZhou, Xingyuen
dc.contributor.authorJi, Boen
dc.date.accessioned2024-02-19T14:06:43Zen
dc.date.available2024-02-19T14:06:43Zen
dc.date.issued2022-11-30en
dc.description.abstractWe study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis. This framework builds upon a novel sufficient condition, which not only is satisfied under general exploration strategies, including upper confidence bound (UCB), Thompson sampling (TS), and new ones based on random exploration, but also enables a unified analysis for showing both sublinear regret and sublinear or even zero constraint violation. We demonstrate the superior performance of our proposed algorithms via numerical experiments based on both synthetic and real-world datasets. Along the way, we also make the first detailed comparison between two popular methods for analyzing constrained bandits and Markov decision processes (MDPs) by discussing the key difference and some subtleties in the analysis, which could be of independent interest to the communities.en
dc.description.notesYes, full paper (Peer reviewed?)en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn9781713871088en
dc.identifier.issn1049-5258en
dc.identifier.orcidJi, Bo [0000-0003-0149-7509]en
dc.identifier.urihttps://hdl.handle.net/10919/118013en
dc.identifier.volume35en
dc.language.isoenen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleOn Kernelized Multi-Armed Bandits with Constraintsen
dc.title.serialAdvances in Neural Information Processing Systemsen
dc.typeConference proceedingen
dc.type.dcmitypeTexten
dc.type.otherConference Proceedingen
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10410_on_kernelized_multi_armed_band.pdf
Size:
589.14 KB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: