Deep Reinforcement Learning for Multirotor Flight Control: A Comparative Study of Sim-to-Real Training and Real-World Performance
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation investigates Deep Reinforcement Learning (DRL) for low-level flight control of multirotor unmanned aerial vehicles (UAVs), focusing on factors that most influence sim-to-real policy transfer. Using the Proximal Policy Optimization (PPO) algorithm, extensive ablation studies evaluated the effects of domain randomization, reward weighting, observation representation, and neural network architecture. Policies were trained in a MuJoCo-based simulator and deployed via TensorFlow Lite Micro inference on PX4 flight controllers. Domain randomization of actuator and mass properties yielded the best balance between positional accuracy and attitude stability, achieving reductions of 40–50 % in position error and roll–pitch oscillation. Policies required only the current vehicle state and previous commanded action, maintaining less than 0.1 m steady-state error and less than 5 % overshoot. Among activation functions tested, ReLU outperformed tanh and ELU, lowering steady-state error by up to 36 % and inference time by 26 %. Post-training quantization further reduced inference latency by ≈40 % with negligible performance loss. Incorporating trajectory tracking during training decreased tracking error by ≈75 % and eliminated temporal lag. The optimal training configuration generalized effectively to multiple vehicle morphologies, including quadcopter, hexacopter, and coaxial octocopter platforms. In addition, the process was successfully extended to an omnidirectional multi-rotor vehicle (OMV). For the OMV, a learned Multi-Layer Perceptron (MLP) controller outperformed both adaptive and PID-based baselines when commanded to track a complex reference attitude, demonstrating stable six-degree-of-freedom trajectory tracking in experimental flights. Collectively, these results provide new insight into parameter sensitivities within the DRL training pipeline and establish a reproducible methodology for sim-to-real policy transfer in aerial robotics.