Skip to content

Dense Match Summarization for Faster Two-view Estimation

Conference: CVPR 2025
arXiv: 2506.02893
Code: None
Area: Text Generation
Keywords: Dense matching, Two-view pose estimation, RANSAC acceleration, Match sparsification, Geometric constraint summarization

TL;DR

This paper proposes a dense match summarization scheme that compresses over 10,000 dense matches into approximately 1% representative matches through clustering and representative match selection. It encodes the geometric constraints of each cluster into a 9×9 matrix, achieving a 10× to 100× speedup for robust RANSAC estimation with negligible accuracy loss.

Background & Motivation

Background: Two-view relative pose estimation is a core subtask of SfM and SLAM. Recently, detector-free dense matching methods (e.g., DKM, RoMa) have significantly improved accuracy and robustness, establishing correspondences even in weakly textured areas.

Limitations of Prior Work: The massive number of correspondences (usually 10,000+) generated by dense matching leads to a dramatic increase in the runtime of the scoring and local optimization (refinement) stages in RANSAC. For instance, when DKM produces 10,000 matches, the scoring and refinement overhead of RANSAC scales linearly with the number of matches.

Key Challenge: Dense matching offers superior accuracy and robustness, but the RANSAC runtime is proportional to the number of matches. Direct random downsampling degrades accuracy, whereas the geometric constraints within a large number of matches are highly redundant.

Goal: To reduce the RANSAC runtime by 1–2 orders of magnitude while preserving the accuracy advantages of dense matching.

Key Insight: Spatially close matches provide approximately identical geometric constraints (epipolar constraints), enabling them to be clustered and replaced by a small set of representative matches. Furthermore, the geometric information within each cluster can be compressed into a compact matrix.

Core Idea: Sparsification is achieved through clustering and representative match selection. Then, a second-order Taylor approximation is employed to compress the Sampson errors of all matches within a cluster into a proxy residual represented by a 9×9 matrix, restoring the geometric accuracy of dense matching with very few matches.

Method

Overall Architecture

The input is a set of dense matches \(\{(x_i, \bar{x}_i)\}_{i=1}^N\) (N≈10000). First, matches are clustered in a 4D space (concatenation of coordinates from both images) to obtain K clusters (K≈N/80). For each cluster, the match closest to the cluster center is selected as the representative. Then, a 9×9 summary matrix \(M_k\) is computed for each cluster to encode its complete geometric constraints. Finally, RANSAC is performed using the K representative matches and their summary matrices.

Key Designs

  1. Clustering and Representative Match Selection:

    • Function: Compresses N dense matches into K representative matches (K≪N).
    • Mechanism: It is assumed that spatially close matches yield similar epipolar residuals. K-means clustering is applied in the 4D match space (concatenating 2D coordinates from both images), and the actual match closest to each cluster center is selected as the representative. These K representative matches are directly used to perform RANSAC.
    • Design Motivation: The vast majority of constraints in dense matching are redundant—even with severe downsampling (K≈N/80), excellent pose estimation can still be achieved.
  2. Dense Match Summarization (Proxy Residuals):

    • Function: Captures the geometric constraints of all matches within each cluster using a 9×9 matrix.
    • Mechanism: It is assumed that matches within each cluster are either entirely inliers or entirely outliers. A second-order Taylor expansion of the Sampson errors of intra-cluster matches is performed at the representative match, yielding a quadratic approximation with respect to the vectorized form of the essential matrix \(e = \text{vec}(E)\). Consequently, the contribution of each cluster can be represented by a 9×9 matrix independent of the number of matches. This proxy residual is then utilized in the refinement step of RANSAC.
    • Design Motivation: Using only representative matches speeds up the process but incurs accuracy loss; the proxy residual restores accuracy close to full evaluation at minimal cost (using 9 residual terms to replace the original match set).
  3. Two-Stage RANSAC Pipeline:

    • Function: Combines sparsification and summarization to achieve fast and accurate pose estimation.
    • Mechanism: The sampling and scoring stages use only the K representative matches (for fast scoring and model selection); the refinement stage utilizes the proxy residuals (9K residual terms instead of N), maintaining speed while restoring accuracy.
    • Design Motivation: The three computational overheads of RANSAC (sampling, scoring, and refinement) are all related to the number of matches. This method reduces the computational burden in all three stages simultaneously.

Loss & Training

This is a training-free geometric method and does not involve any training or loss functions.

Key Experimental Results

Main Results (MegaDepth with multiple dense matchers)

Matcher Method AUC@5° AUC@10° Runtime Speedup
DKM Full dense 73.8 84.4 98ms
DKM Ours (cluster) 73.5 84.2 1.8ms ~55×
DKM Ours (summary) 73.7 84.3 2.3ms ~43×
RoMa Full dense 74.9 85.2 105ms
RoMa Ours (summary) 74.7 85.1 2.5ms ~42×

Ablation Study

Configuration AUC@5° Runtime Description
Full dense (10K) 73.8 98ms Full matches
Random subsample 125 72.4 1.6ms Random downsampling leads to significant loss
Cluster 125 (Ours) 73.5 1.8ms Clustering selection yields high accuracy
Cluster + Summary 73.7 2.3ms Summary matrix further restores accuracy
Superpixel clustering 73.4 2.0ms Performance slightly lower than K-means

Key Findings

  • Even when downsampled to approximately 1% (125/10000), clustering-based selection loses only 0.3 AUC.
  • The proxy residuals further restore the accuracy to a level close to the full dense matching (with a gap of only 0.1 AUC).
  • The sparsified dense matches still outperform state-of-the-art sparse matchers (e.g., SuperPoint+LightGlue).
  • The method is matcher-agnostic, demonstrating effectiveness across various dense matchers such as DKM and RoMa.

Highlights & Insights

  • The core insight is simple yet powerful: 99% of the geometric constraints in dense matching are redundant. This observation may prompt the dense matching community to rethink the value of "more matches".
  • The derivation of the 9×9 summary matrix is based on the classic Taylor expansion, but it cleverly compresses a variable number of matches into a fixed-size representation. This is analogous to the 2×2 matrix of affine correspondences (ACs) but provides stronger constraints.
  • This method can be orthogonally combined with any RANSAC improvements (e.g., PROSAC, MAGSAC).

Limitations & Future Work

  • The assumption of all-inlier/all-outlier clusters may not hold in regions with poor match quality or at the boundaries where inliers and outliers mix.
  • The K-means clustering itself introduces some overhead, though it is far smaller than the saved RANSAC time.
  • It only addresses the pose estimation scenario and does not explore extensions to other geometric problems such as homography estimation or PnP.
  • The choice of the number of clusters K requires balancing accuracy and speed; K≈125 is an empirical choice in the paper.
  • vs. Standard RANSAC: Fully compatible; it is essentially a pre-processing step that can be plugged into any RANSAC variant.
  • vs. Sampling Improvements like PROSAC: PROSAC accelerates sampling but does not speed up scoring and refinement; this method accelerates all three stages simultaneously.
  • vs. Affine Correspondences (AC): ACs use a 2×2 matrix to encode geometric information under the local planar assumption; the 9×9 matrix in this paper does not assume local planarity and provides stronger constraints.

Rating

  • Novelty: ⭐⭐⭐⭐ Simple yet effective idea, systematically addressing dense match redundancy for the first time.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Multiple matchers, multiple datasets, and comprehensive ablation.
  • Writing Quality: ⭐⭐⭐⭐⭐ Clear derivation, precise problem definition, and well-designed experiments.
  • Value: ⭐⭐⭐⭐ A plug-and-play acceleration scheme with high practical value.