Abstract

In environments characterized by high temporal complexity and incomplete information, effective policy optimization becomes a core challenge in multi-agent systems. This paper investigates the use of Multi-Agent Deep Reinforcement Learning (MADRL) under conditions of partial observability, where agents must learn to act based only on local and noisy observations. We propose a policy learning framework that incorporates recurrent neural networks (RNNs) for memory-based representation and leverages centralized training with decentralized execution (CTDE). The system is evaluated on benchmark decentralized partially observable environments, demonstrating superior stability and policy convergence compared to baseline algorithms. Our findings highlight the potential of causally-aware memory policies and attention-driven coordination in solving complex sequential tasks with minimal information.

Close Copy Text

Paper Title

Multi-Agent Deep Reinforcement Learning for Policy Optimization in Sequential Data Environments with Partial Observability

Authors

Keywords

Article Type

Journal

Issue

Published On

Downloads

Abstract

Uploded Document Preview

QUICKLINKS

CONTACT US