V2X-Radar: A Multi-Modal Dataset with 4D Radar for Cooperative Perception¶

Conference: NeurIPS 2025 arXiv: 2411.10962 Code: GitHub Area: Autonomous Driving Keywords: cooperative perception, 4D radar, V2X, multi-modal dataset, 3D object detection

TL;DR¶

This paper presents V2X-Radar, the first large-scale real-world multi-modal vehicle-to-everything (V2X) cooperative perception dataset incorporating 4D radar, LiDAR, and multi-view camera data. The dataset covers diverse weather and lighting conditions, providing 20K LiDAR frames, 40K camera images, 20K 4D radar scans, and 350K annotated bounding boxes, along with comprehensive benchmarks across three sub-datasets.

Background & Motivation¶

Single-vehicle perception faces critical safety challenges due to occlusion and limited sensing range. V2X cooperative perception extends the perceptual range through information sharing; however, a significant gap remains:

Existing V2X datasets lack 4D radar: Datasets such as OPV2V, DAIR-V2X, and V2V4Real include only cameras and LiDAR, overlooking the robustness advantages of 4D radar under adverse weather conditions.

Single-vehicle datasets already incorporate 4D radar: Works such as K-Radar and Dual-Radar have demonstrated the strong adaptability of 4D radar to extreme weather including rain, snow, and fog, yet no such dataset exists for cooperative perception.

Insufficient scene diversity: Existing real-world V2X datasets are typically collected under specific time periods or weather conditions, lacking comprehensive coverage of day/night, clear/rain/snow/fog scenarios.

Method¶

Overall Architecture¶

The V2X-Radar dataset comprises three subsets: - V2X-Radar-C: Vehicle-infrastructure cooperative perception (40 sequences) - V2X-Radar-I: Infrastructure-side perception (10 sequences) - V2X-Radar-V: Single-vehicle perception (10 sequences)

Key Designs¶

Sensor Platform Configuration:
- Vehicle side: RoboSense RS-Ruby-80 LiDAR (80-beam) + Basler camera (1920×1080) + Arbe Phoenix 4D radar (77 GHz, horizontal ±50°) + GPS/IMU + C-V2X communication unit
- Infrastructure side: RoboSense RS-Ruby-80 LiDAR + 3 Basler multi-view cameras (1536×864) + OCULI EAGLE 4D radar (79 GHz, horizontal ±56°) + GPS/IMU + C-V2X communication unit
- The 4D radar provides sparse point clouds with Doppler information, offering inherent robustness to adverse weather.
Data Synchronization and Calibration:
- Temporal synchronization: All computer clocks are first aligned to GPS time, followed by hardware-triggered synchronization via PTP + PPS signals; the temporal offset between vehicle-side and infrastructure-side sensors is controlled within 20 ms.
- Spatial calibration: Camera intrinsics are calibrated using a checkerboard; LiDAR–camera extrinsics are estimated by minimizing reprojection error over 100 2D–3D point correspondences; LiDAR–4D radar calibration uses high-intensity points on corner reflectors.
- Vehicle-infrastructure registration: Initial registration is based on RTK localization, refined through the CBM algorithm and manual adjustment.
Data Annotation and Quality Control:
- A semi-automatic annotation pipeline combines automatic labeling with human refinement through multiple rounds of quality checks.
- Five object categories: pedestrian, cyclist, car, bus, and truck.
- Cooperative annotation: vehicle-side and infrastructure-side annotations are generated in their respective LiDAR coordinate systems, then unified into the infrastructure coordinate system and deduplicated via IoU matching.
- Privacy protection: Model-based detection combined with manual frame-by-frame inspection anonymizes all road names, localization data, license plates, and faces.

Loss & Training¶

As a dataset paper, V2X-Radar does not introduce novel training strategies. Instead, it provides benchmark experiments under three fusion paradigms: - Early Fusion: Raw point clouds from all agents are merged prior to detection. - Late Fusion: Each agent performs independent detection, followed by NMS-based merging. - Intermediate Fusion: Each agent extracts intermediate features for transmission and fusion (F-Cooper, V2X-ViT, CoAlign, HEAL).

Key Experimental Results¶

Main Results¶

Infrastructure-side 3D object detection (V2X-Radar-I):

Method	Modality	Vehicle AP@0.7 (Easy)	Pedestrian AP@0.5 (Easy)	Cyclist AP@0.5 (Easy)
PV-RCNN	LiDAR	88.83	77.13	91.82
CenterPoint	LiDAR	86.44	67.90	90.26
BEVHeight++	Camera	48.48	33.05	45.19
RPFA-Net	4D Radar	64.79	51.64	45.86

Single-vehicle 3D object detection (V2X-Radar-V):

Method	Modality	Vehicle AP@0.7 (Easy)	Pedestrian AP@0.5 (Easy)	Cyclist AP@0.5 (Easy)
PV-RCNN	LiDAR	88.27	67.04	89.48
BEVHeight++	Camera	17.47	10.43	12.99
RPFA-Net	4D Radar	42.77	11.51	17.03

Ablation Study¶

Cooperative perception vs. single-vehicle perception (vehicle category, AP@0.5):

Configuration	Overall	0–30m	30–50m	50–100m	Note
No Fusion (Camera)	6.76	9.41	5.16	2.27	Single camera extremely weak
Late Fusion (Camera)	32.88	40.58	30.37	20.00	Cooperation yields large gains
V2X-ViT (LiDAR, Sync)	~85+	~90+	~80+	~60+	Intermediate fusion best
V2X-ViT (LiDAR, Async)	degraded	-	-	-	Communication latency significantly hurts performance

Key Findings¶

LiDAR remains strongest; 4D radar outperforms camera: In infrastructure-side detection, 4D radar (RPFA-Net, 64.79%) is substantially lower than LiDAR (PV-RCNN, 88.83%) but significantly surpasses camera (BEVHeight++, 48.48%).
Vehicle-side camera performance drops sharply: Single-view vehicle camera (17.47%) is far inferior to multi-view infrastructure camera (48.48%), underscoring the importance of viewpoint coverage.
Cooperative perception yields substantial improvements: All cooperative methods significantly outperform their single-vehicle counterparts.
Communication latency is a critical challenge: Under asynchronous settings (100 ms delay), all methods exhibit notable performance degradation, particularly under the strict IoU=0.7 threshold.
Annotations per sample reach up to 90: This far exceeds single-vehicle datasets such as KITTI/nuScenes (approximately 10–30), reflecting the enhanced environmental understanding enabled by cooperative perception.

Highlights & Insights¶

The dataset fills a critical gap in the V2X domain by introducing 4D radar data, which is essential for safe driving under adverse weather conditions.
Data collection spanning 9 months covers clear/rain/fog/snow conditions and day/dusk/night scenarios, offering superior scene diversity compared to existing datasets.
Rigorous temporal synchronization (<20 ms) and spatial calibration pipelines ensure high data quality.
The three-subset design supports multiple research directions, enhancing the dataset's general applicability.

Limitations & Future Work¶

The dataset scale is relatively limited (20K frames), still behind nuScenes (40K) or Waymo (200K+).
Only V2I scenarios involving a single vehicle and a single roadside unit are covered; V2V or multi-infrastructure cooperative settings are not included.
4D radar point clouds are extremely sparse (typically only dozens of points per vehicle), limiting the direct applicability of existing methods.
Annotations are restricted to 3D bounding boxes, without richer labels such as semantic segmentation or tracking IDs.
The vehicle side is equipped with only a single front-facing camera, limiting the applicability of surround-view perception algorithms.

Relation to DAIR-V2X: DAIR-V2X is a real-world V2I dataset but lacks 4D radar; V2X-Radar augments this setting with the 4D radar modality.
Relation to K-Radar/Dual-Radar: These are single-vehicle 4D radar datasets; V2X-Radar extends 4D radar into the cooperative perception domain.
Insight: The potential of 4D radar in cooperative perception remains largely unexplored, with broad research opportunities in multi-modal fusion (radar + LiDAR + camera) and adverse-weather robustness.

Rating¶

Novelty: ⭐⭐⭐⭐ First V2X dataset with 4D radar, filling an important gap
Experimental Thoroughness: ⭐⭐⭐⭐ Three subsets × multiple detection methods × synchronous/asynchronous settings
Writing Quality: ⭐⭐⭐⭐ Follows standard dataset paper conventions with thorough detail
Value: ⭐⭐⭐⭐ Significant contribution to advancing 4D radar research in cooperative perception