ArgMatch: Adaptive Refinement Gathering for Efficient Dense Matching¶
Conference: ICCV 2025 arXiv: N/A Code: GitHub Area: 3D Vision / Dense Matching Keywords: dense matching, coarse-to-fine, correlation volume, adaptive refinement, efficient matching
TL;DR¶
This paper proposes an Adaptive Refinement Gathering pipeline that substantially reduces dependence on heavy feature extractors and global matchers through a content-aware offset estimator, a local-consistency matching corrector, and a local-consistency upsampler, achieving competitive dense matching performance with a lightweight network.
Background & Motivation¶
Dense correspondence is fundamental to multi-view tasks but incurs high computational cost. Although coarse-to-fine strategies reduce this cost, efficiency remains bottlenecked by heavy feature extractors (e.g., DINOv2) and complex global matchers (e.g., Gaussian processes). The authors argue that redundancy in existing methods cannot be eliminated without sacrificing performance, owing to insufficiently effective refiner design: (1) existing CV-based refiners require high-dimensional features to ensure CV sharpness; (2) they can only correct errors within a local window; and (3) they fail to co-optimize effectively with other modules.
Method¶
Overall Architecture¶
ArgMatch follows a three-stage pipeline of feature extraction → global matching → iterative refinement, with its core contribution being the Adaptive Refinement Gathering pipeline: the initial flow is progressively refined at resolutions 1/16 → 1/8 → 1/4. Each stage comprises a content-aware offset estimator (CV decoding), a matching corrector (neighborhood aggregation correction), and an upsampler (local-consistency upsampling).
Key Designs¶
-
Content-Aware Offset Estimator: Adaptively scales the sampling window and exploits content information within it. A latent code \(z\) modulates the encoding and decoding of the CV, enabling it to better adapt to local content characteristics. Rather than pursuing high-dimensional features for CV sharpening, this design leverages content information for more reliable offset estimation.
-
Local-Consistency Matching Corrector: Corrects matches by adaptively aggregating neighborhood information via semantic correlation and matching confidence, remaining effective even when the initial error exceeds the local window. The key insight is to propagate gradients based on semantic similarity rather than fixed geometric distance, avoiding propagation across depth discontinuities.
-
Adaptive Gated Aggregation and Local-Consistency Upsampler: An adaptive gating mechanism aggregates outputs from the offset estimator and forwards gradients from large-error matches to coarser levels. The upsampler employs a similar local-consistency mechanism to accurately upsample low-resolution matches, reducing artifacts at depth-discontinuity boundaries.
Loss & Training¶
The model is trained end-to-end. The adaptive gating and local-consistency mechanisms improve coarse-to-fine gradient propagation, addressing the optimization difficulty caused by ambiguous supervision (one-to-many issues) at the coarse stage.
Key Experimental Results¶
Main Results¶
| Method | Params/FLOPs | Dense Matching Accuracy | Geometry Estimation | Visual Localization |
|---|---|---|---|---|
| DKM | High (\(O(n^3)\)) | SOTA-level | Good | Good |
| RoMa (DINOv2) | High | SOTA | Best | Best |
| ArgMatch | Significantly lower | Competitive | Competitive | Competitive |
ArgMatch achieves performance competitive with SOTA at significantly lower computational cost.
Ablation Study¶
- Content-aware modulation vs. standard CV decoding: modulation yields substantial accuracy gains.
- Matching corrector: particularly effective under large initial errors.
- Local-consistency upsampling vs. bilinear interpolation: clear improvement at depth discontinuities.
- Adaptive gated gradient allocation vs. fixed allocation: more principled end-to-end optimization.
Key Findings¶
- A lightweight backbone combined with an effective refiner can substitute for heavy features with simple refiners.
- Local consistency is critical across depth discontinuities — preventing erroneous gradient propagation.
- Adaptive gating resolves the gradient allocation challenge between coarse and fine stages.
Highlights & Insights¶
- Elevates the importance of the refiner to be on par with the feature extractor.
- The local-consistency principle unifies both the corrector and upsampler modules.
- The adaptive gradient propagation mechanism effectively addresses optimization coupling between coarse and fine stages.
- Achieves a favorable efficiency–accuracy trade-off.
Limitations & Future Work¶
- May still fall slightly short of RoMa with DINOv2 at peak accuracy.
- Local consistency relies on semantic correlation estimation, which may be unreliable in texture-sparse regions.
- Validation is limited to geometric matching; extension to optical flow estimation has not been explored.
Related Work & Insights¶
- DKM and RoMa serve as the primary SOTA baselines.
- The iterative refinement paradigm from the RAFT series is adopted.
- The low-resolution processing strategy of SEA-RAFT provides a reference.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The three-component refiner pipeline design is novel and practical.
- Technical Depth: ⭐⭐⭐⭐ — Gradient propagation analysis and local-consistency design are well-grounded.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Multi-task evaluation, efficiency comparisons, and ablations are comprehensive.
- Writing Quality: ⭐⭐⭐⭐ — Problem analysis is thorough and pipeline description is clear.
- Value: ⭐⭐⭐⭐ — Lightweight and efficient, well-suited for resource-constrained scenarios.