Understanding the Role of Feedback in Online Learning with Switching Costs

dc.contributor.authorCheng, Duoen
dc.contributor.authorZhou, Xingyuen
dc.contributor.authorJi, Boen
dc.date.accessioned2024-02-19T13:55:44Zen
dc.date.available2024-02-19T13:55:44Zen
dc.date.issued2023en
dc.description.abstractIn this paper, we study the role of feedback in online learning with switching costs. It has been shown that the minimax regret is Θ(equation presented)(T2/3) under bandit feedback and improves to Θ(equation presented)(√T) under full-information feedback, where T is the length of the time horizon. However, it remains largely unknown how the amount and type of feedback generally impact regret. To this end, we first consider the setting of bandit learning with extra observations; that is, in addition to the typical bandit feedback, the learner can freely make a total of Bex extra observations. We fully characterize the minimax regret in this setting, which exhibits an interesting phase-transition phenomenon: when Bex = O(T2/3), the regret remains Θ(equation presented)(T2/3), but when Bex = Ω(T2/3), it becomes Θ(equation presented)(T/√Bex), which improves as the budget Bex increases. To design algorithms that can achieve the minimax regret, it is instructive to consider a more general setting where the learner has a budget of B total observations. We fully characterize the minimax regret in this setting as well and show that it is Θ(equation presented)(T/√B), which scales smoothly with the total budget B. Furthermore, we propose a generic algorithmic framework, which enables us to design different learning algorithms that can achieve matching upper bounds for both settings based on the amount and type of feedback. One interesting finding is that while bandit feedback can still guarantee optimal regret when the budget is relatively limited, it no longer suffices to achieve optimal regret when the budget is relatively large.en
dc.description.versionPublished versionen
dc.format.extentPages 5521-5543en
dc.format.mimetypeapplication/pdfen
dc.identifier.eissn2640-3498en
dc.identifier.orcidJi, Bo [0000-0003-0149-7509]en
dc.identifier.urihttps://hdl.handle.net/10919/118011en
dc.identifier.volume202en
dc.language.isoenen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleUnderstanding the Role of Feedback in Online Learning with Switching Costsen
dc.title.serialProceedings of Machine Learning Researchen
dc.typeConference proceedingen
dc.type.dcmitypeTexten
dc.type.otherConference Proceedingen
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
cheng23f.pdf
Size:
494.06 KB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: