Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning

dc.contributor.authorBai, Yitaoen
dc.contributor.committeechairDoan, Thinh T.en
dc.contributor.committeememberStilwell, Daniel J.en
dc.contributor.committeememberJin, Mingen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2024-04-30T12:20:32Zen
dc.date.available2024-04-30T12:20:32Zen
dc.date.issued2024-04-05en
dc.description.abstractWe consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.en
dc.description.abstractgeneralFirst, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://hdl.handle.net/10919/118699en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCC0 1.0 Universalen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectMarkov decision processesen
dc.subjectMulti-task reinforcement learningen
dc.titleSample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learningen
dc.typeThesisen
dc.type.dcmitypeTexten
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Master_Thesis_Yitao_Bai.pdf
Size:
1.69 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections