BayesG: Bayesian Ego-Graph Inference for Networked Multi-Agent Reinforcement Learning¶

Conference: NeurIPS 2025 arXiv: 2509.16606 Code: https://github.com/Wei9711/BayesG Area: Autonomous Driving Keywords: Bayesian inference, ego-graph, networked MARL, dynamic communication graph, decentralization

TL;DR¶

BayesG enables each agent in networked MARL to learn the dynamic structure of its local communication graph via Bayesian variational inference — sampling edge masks with Gumbel-Softmax and jointly optimizing policy and graph structure under an ELBO objective — achieving 50%+ reward improvement over the best baseline in a 167-agent New York traffic scenario.

Background & Motivation¶

Background: In networked MARL, agents exchange information through communication graphs. Existing methods rely on fixed communication graphs or require global state to learn dynamic graphs.

Limitations of Prior Work: Fixed neighbor sets are suboptimal in dynamic environments, as the informational value of different neighbors varies over time. Centralized graph learning (requiring global observability) is impractical in decentralized systems.

Key Challenge: Agents possess only local observations yet must determine "which neighbors provide the most useful information" — an inherently uncertainty-laden problem.

Goal: Enable each agent to learn task-adaptive local communication graph structures in a fully decentralized manner.

Key Insight: Model edge existence/absence as Bernoulli random variables and estimate the posterior from local data via variational Bayesian inference.

Core Idea: Each agent performs Bayesian variational inference over the edges of its ego-graph (Bernoulli + Gumbel-Softmax); an ELBO objective jointly optimizes policy and graph structure, enabling decentralized dynamic communication.

Method¶

Overall Architecture¶

Agent \(i\)'s policy is conditioned on a sampled subgraph: \(\pi_i(u_i, G_{\mathcal{V}_i} | s_{\mathcal{V}_i}) = \rho(G | s) \cdot \tilde{\pi}_i(u_i | \tilde{f}_i(s, G))\). The edge mask \(Z_i\) is sampled from the variational distribution \(q(Z_i; \phi_i) = \prod \text{Bern}(z_{ij}; \sigma(\phi_{ij}))\) and made differentiable via Gumbel-Softmax.

Key Designs¶

Bayesian Edge Inference: The variational approximation \(q(Z_{ij})\) is Bernoulli, and the prior \(p(Z_{ij})\) incorporates a retention bias \(\lambda\). ELBO: \(\mathcal{L} = E_q[-\mathcal{L}_{\theta,\varphi}] - \sum_{j} \text{KL}(q \| p)\)
GNC Message Passing: Graph neural communication is performed over the masked adjacency matrix \(A_i^* = Z_i \odot A_i\)
Multi-Feature Input: Three categories of information — neighbor states, trajectories, and policy features

Loss & Training¶

Actor-Critic jointly optimized with ELBO
KL regularization encourages sparse graphs (retaining only informative edges)

Key Experimental Results¶

Main Results (Adaptive Traffic Signal Control, ATSC)¶

Environment	BayesG	NeurComm	CommNet	Gain
Grid 5×5	~-15	~-20	~-30	+25%
NewYork 167 agents	~-30	~-45	~-60	+50%

Ablation Study¶

Configuration	Performance
No mask	Baseline performance
Random mask	Severe degradation
Learned mask	Optimal
Trajectory + State + Policy	Best feature combination

Key Findings¶

Learned graph structure significantly outperforms fixed graphs — especially in large-scale scenarios (167 agents)
Random masking is harmful, demonstrating the necessity of structure learning
Faster convergence (substantial lead observed in early training stages)

Highlights & Insights¶

Bayesian treatment of uncertainty is natural: When it is unclear which neighbors are informative, probabilistic sampling is more robust than hard selection
KL regularization induces sparsity automatically: No manual communication budget specification is required

Limitations & Future Work¶

The temporal evolution of learned graph structures is not analyzed
Evaluation is limited to 167 agents
Fixed communication intervals are assumed

vs. CommNet: Uses fixed fully-connected graphs; BayesG learns sparse dynamic graphs
vs. NeurComm: Employs centralized graph learning; BayesG is fully decentralized

Rating¶

Novelty: ⭐⭐⭐⭐ Natural integration of Bayesian graph inference with MARL
Experimental Thoroughness: ⭐⭐⭐⭐ 5 environments + ablation study
Writing Quality: ⭐⭐⭐⭐ Method is clearly presented
Value: ⭐⭐⭐⭐ Practical solution for distributed multi-agent systems
Interaction structure should be dynamic rather than predefined — Bayesian inference enables agents to adaptively select interaction partners
Outperforms fully-connected and fixed-graph methods in 167-agent traffic control; the learned sparse graph is more efficient
The core contribution lies in the simplicity and effectiveness of the design
Experimental results thoroughly validate the central hypothesis