Abstract
In environments characterized by high temporal complexity and incomplete information, effective policy optimization becomes a core challenge in multi-agent systems. This paper investigates the use of Multi-Agent Deep Reinforcement Learning (MADRL) under conditions of partial observability, where agents must learn to act based only on local and noisy observations. We propose a policy learning framework that incorporates recurrent neural networks (RNNs) for memory-based representation and leverages centralized training with decentralized execution (CTDE). The system is evaluated on benchmark decentralized partially observable environments, demonstrating superior stability and policy convergence compared to baseline algorithms. Our findings highlight the potential of causally-aware memory policies and attention-driven coordination in solving complex sequential tasks with minimal information.
View more >>