Experiences with VITIS AI for Deep Reinforcement Learning

Loading...
Thumbnail Image

TR Number

Date

2024-09

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Deep reinforcement learning has found use cases in many applications, such as natural language processing, self-driving cars, and spacecraft control applications. Many use cases of deep reinforcement learning seek to achieve inference with low latency and high accuracy. As such, this work articulates our experiences with the AMD Vitis AI toolchain to improve the latency and accuracy of inference in deep reinforcement learning. In particular, we evaluate the soft actor-critic (SAC) model that is trained to solve the MuJoCo humanoid environment, where the objective of the humanoid agent is to learn a policy that allows it to stay in motion for as long as possible without falling over. During the training phase, we prune the model using the weight sparsity pruner from the Vitis AI optimizer at different timesteps. Our experimental results show that pruning leads to an improvement in the evaluation of the reinforcement learning policy, where the trained agent can remain balanced in the environment and accumulate higher rewards, compared to a trained agent without pruning. Specifically, we observe that pruning the network during training can deliver up to 20% better mean episode length and 23% higher reward (better accuracy), compared to a network without any pruning. Additionally, there is an improvement in decision-making latency up to 20%, which is the time between the observation of the agent's state and a control decision.

Description

Keywords

reinforcement learning, humanoid, MuJoCo, network pruning, FPGA, GPU, Vitis AI

Citation