Causally-Aware Safe Reinforcement Learning for Long-Horizon Partially Observable Environments in IoT Systems
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Real-world decision-making problems often exhibit partial observability, shifting dynamics, and potentially high-stakes outcomes. In Internet of Things (IoT) deployments, these issues become even more pronounced due to noisy sensor data, intermittent connectivity, and the sheer scale of distributed devices. Traditional reinforcement learning (RL) methods may fail to safely generalize in these environments because they rely on correlational patterns rather than causal mechanisms. In this work, we propose a novel Causally-Aware Safe Reinforcement Learning (CAS-RL) framework that integrates causal structure learning with robust policy optimization. Our approach discovers latent causal factors and enforces safety constraints at every decision step, resulting in policies that are more interpretable, safer under distribution shifts, and scalable to long-horizon tasks. Empirical results on both synthetic benchmarks and a healthcare-oriented partially observable domain show that CAS-RL significantly outperforms state-of-the-art baselines in terms of robustness, safety compliance, and sample efficiency.