Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV¶
Paper Information¶
- Conference: ICLR 2026
- arXiv: 2507.19234
- Code: https://github.com/GeminiLight/Virne
- Area: Reinforcement Learning / Network Resource Allocation / Combinatorial Optimization
- Keywords: NFV-RA, Virtual Network Embedding, Benchmark Framework, GNN, PPO, Scalability
TL;DR¶
This paper proposes Virne — a comprehensive benchmark framework for Network Function Virtualization Resource Allocation (NFV-RA) — integrating 30+ algorithms and a gym-style environment to support systematic evaluation across cloud, edge, 5G, and other scenarios.
Background & Motivation¶
Core Problem¶
Resource allocation in Network Function Virtualization (NFV-RA) is an NP-hard combinatorial optimization problem that maps virtual network requests onto physical network infrastructure. Although deep RL has shown promise in this domain, a systematic benchmark for comprehensive simulation and rigorous evaluation is lacking.
Limitations of Prior Work¶
- Existing benchmarks cover only specific scenarios (e.g., cloud) and lack support for edge computing and 5G slicing.
- Only a small number of non-RL methods (3–5) are implemented, with no unified RL pipeline.
- Evaluation is limited to online effectiveness, neglecting practical dimensions such as feasibility, generalizability, and scalability.
- Fragmented problem definitions make fair comparison difficult.
Method¶
Overall Architecture¶
Virne consists of five core modules: 1. Simulation Configuration: Customizable network topologies, resource types, and service demands. 2. Network System: An event-driven simulator that processes online virtual network requests. 3. Algorithm Implementation: A modular pipeline integrating 30+ methods. 4. Auxiliary Tools: System controllers, solution monitors, and visualization utilities. 5. Evaluation Protocol: A multi-dimensional assessment framework.
NFV-RA Problem Formulation¶
System Model: Physical network \(\mathcal{G}_p = (\mathcal{N}_p, \mathcal{L}_p)\); virtual network \(\mathcal{G}_v = (\mathcal{N}_v, \mathcal{L}_v, \omega, \varpi)\).
Embedding Constraints: - Node mapping \(f_\mathcal{N}\): one-to-one mapping satisfying resource constraints \(C(n_v) \leq C(n_p)\). - Link mapping \(f_\mathcal{L}\): physical paths connecting mapped node endpoints with bandwidth constraint \(B(l_v) \leq B(l_p)\).
Optimization Objective (Revenue-to-Cost Ratio): $\(\max \text{R2C}(S) = (\varkappa \cdot \text{REV}(S)) / \text{COST}(S)\)$
MDP Formulation¶
NFV-RA is modeled as an MDP \((\mathcal{S}, \mathcal{A}, P, R, \lambda)\): - State space \(\mathcal{S}\): embedding states of the VN and PN. - Action space \(\mathcal{A}\): selecting a physical node to host a virtual node. - Reward \(R\): a designed feedback signal to guide optimization. - Per-step decision: select a physical node → attempt placement → route virtual links → update resources.
Unified RL Pipeline¶
RL-based NFV-RA algorithms are unified into three components: 1. MDP Modeling: Reward design and feature engineering. 2. Policy Architecture: MLP, CNN, GCN, GAT, BiGCN, BiGAT, HeteroGAT, etc. 3. Training Method: PG, A3C, PPO, MCTS.
Experiments¶
Implementation Technique Analysis¶
Key implementation choices are systematically evaluated on the WX100 topology:
| Technique | Best Configuration | Finding |
|---|---|---|
| Reward Function | fixed=0.1 | Moderate fixed intermediate rewards outperform adaptive rewards |
| Feature Engineering | Status + Topological | Topological features provide valuable augmentation |
| Action Masking | Enabled | RAC improves by up to 5.3% |
| RL Algorithm | PPO | Fastest convergence and highest performance |
Main Results¶
| Method | WX100 RAC↑ | GEANT RAC↑ | BRAIN RAC↑ |
|---|---|---|---|
| PPO-MLP | 71.90 | 55.80 | 51.30 |
| PPO-GCN | 66.80 | - | - |
| PPO-DualGAT | 78.10 | - | - |
| D-Vine | - | - | - |
Evaluation Dimensions¶
- Effectiveness: Online acceptance rate and long-term revenue-to-cost ratio.
- Feasibility: Constraint satisfaction rate of generated solutions.
- Generalizability: Reliability under varying network conditions.
- Scalability: Performance variation with increasing problem size.
Key Findings¶
- PPO-DualGAT combined with optimal implementation techniques achieves the best performance in most settings.
- Moderate fixed reward (0.1) > adaptive reward > excessively large or small fixed rewards.
- Action masking is critical for handling the complex constraints in NFV-RA.
- The performance advantage of GNN-based architectures correlates with scenario complexity.
Highlights & Insights¶
- Most comprehensive NFV-RA benchmark: 30+ algorithms, gym-style environment, and multi-scenario support.
- Systematic analysis of implementation techniques: Quantitative impact of key choices including reward design, feature engineering, and action masking.
- Multi-dimensional evaluation protocol: Extends beyond online effectiveness to cover feasibility, generalizability, and scalability.
- Modular design: Facilitates community extension with new methods.
Limitations & Future Work¶
- A gap remains between simulation and real network environments.
- Credit assignment challenges for RL methods at large scale are not fully resolved.
- Support for emerging scenarios (e.g., 6G network slicing) is still under development.
- Some RL methods may be sensitive to hyperparameter choices.
Related Work & Insights¶
- Traditional Benchmarks: VNE-Sim (2014), ALEVIN (2016) — cloud-only scenarios with a limited set of heuristics.
- RL-based NFV-RA: Various approaches employing neural network policies based on CNN, GCN, GAT, etc.
- RL for Combinatorial Optimization: Methodological connections to RL approaches for problems such as TSP and VRP.
Rating¶
- Novelty: ⭐⭐⭐ — The primary contribution lies in systems engineering rather than algorithmic innovation.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Highly comprehensive experiments and ablations.
- Writing Quality: ⭐⭐⭐⭐ — Clear and well-organized structure.
- Value: ⭐⭐⭐⭐⭐ — An extremely valuable benchmark tool for the research community.