3D Gaussian Splatting at Arbitrary Resolutions with Compact Proxy Anchors¶

Conference: CVPR 2026
Paper: CVF Open Access
Code: https://github.com/JungMinKyun/ARCA-GS (Available)
Area: 3D Vision
Keywords: 3D Gaussian Splatting, Arbitrary Resolution Rendering, Anti-aliasing, Anchor Compression, FiLM Modulation

TL;DR¶

Building upon the anchor-based framework of Scaffold-GS, this paper employs FiLM to inject "target resolution" into anchor features and introduces a "Pixel Coverage Gate" to dynamically activate Gaussians based on sampling rates, achieving aliasing-free rendering at continuous arbitrary resolutions. Simultaneously, the method stores only approximately 30% of proxy anchors and utilizes a residual predictor to reconstruct the remaining leaf anchors online, reducing storage to nearly half of Scaffold-GS without compromising quality.

Background & Motivation¶

Background: 3D-GS represents scenes as a collection of anisotropic 3D Gaussians for real-time rendering. Scaffold-GS further organizes these Gaussians into anchors (where each anchor stores a latent feature used by an MLP to decode multiple Gaussians), significantly reducing GPU memory. This is the current mainstream approach for anchor-based Gaussian splatting.

Limitations of Prior Work: These methods are typically trained at a fixed resolution. During deployment (such as continuous zooming in AR/VR or digital twins), rendering at resolutions unseen during training leads to aliasing, jagged edges, and blurring because the projected 2D Gaussian coverage does not match the pixel sampling rate. Existing multi-resolution anti-aliasing methods either rely on test-time scale-dependent filtering (Mip-Splatting, SA-GS) or simply store separate sets of Gaussians for different resolutions (Multi-scale 3D-GS), the latter of which leads to memory explosion.

Key Challenge: There is a trade-off between fidelity and storage. Achieving clarity at arbitrary resolutions requires either scale-specific Gaussians (memory intensive) or filtering (not adaptive enough). Furthermore, compression of anchors is often fragile under continuous resolution changes, resulting in quality degradation during scaling.

Goal: Split the problem into two sub-problems: (1) Enabling a single set of anchors to adaptively generate Gaussians according to the target resolution (continuous anti-aliasing without storing multiple sets); (2) Further reducing the number of anchors without losing detail.

Key Insight: Anti-aliasing fundamentally requires matching the projected 2D Gaussian coverage to the pixel sampling rate. Since Scaffold-GS decodes Gaussians from anchor features "on-demand," encoding resolution information into anchor features and applying a coverage-based gate at the decoding end allows the same anchor to output different Gaussians for different resolutions.

Core Idea: Resolution is treated as a continuous condition to modulate anchor features via FiLM and generate "resolution-adaptive Gaussians" through pixel coverage gating. Storage is then compressed using a proxy/leaf anchor encoder-decoder structure (inspired by PointMAE).

Method¶

Overall Architecture¶

The method is built upon the "anchor → Gaussian" structure of Scaffold-GS. Inputs are all anchors obtained from SfM point cloud discretization, with each anchor storing a feature \(f\), scale, and offset. All anchors are categorized into two types: approximately 30% as proxy anchors (red, the representative set actually stored) and the remainder as leaf anchors (gray, reconstructed during inference and not stored after training).

During rendering, for each anchor, target resolution information is injected into the feature via FiLM Resolution Embedding to obtain resolution-adaptive features \(\hat f_v\). These features are fed into a Gaussian attribute MLP to decode 3D Gaussians, which are then filtered by the Pixel Coverage Gate (PCG) based on their projected coverage at the target resolution to remove excessively small Gaussians. Storage is handled by a Residual Anchor Predictor: in the second training stage, it learns to reconstruct leaf anchor features from neighboring proxy anchors, eliminating the need to save leaf anchor features.

Training proceeds in two stages: Stage 1 follows standard Scaffold-GS, using all (proxy+leaf) anchors. Stage 2 occurs after the anchor set is frozen (growing & pruning finished), jointly training proxy anchors and the residual predictor for leaf reconstruction.

%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400}}}%%
flowchart TD
    A["SfM Point Cloud → All Anchors"] --> B["FiLM Resolution Embedding<br/>Inject Target Resolution into Features"]
    B --> C["Gaussian Attribute MLP<br/>Decode 3D Gaussians"]
    C --> D["Pixel Coverage Gate PCG<br/>Suppress Small Gaussians via Coverage"]
    A -->|Approx 30% Rep Set| E["Density-Aware<br/>Proxy Anchor Selection"]
    E --> F["Residual Anchor Predictor<br/>proxy→leaf Online Reconstruction"]
    F -.Reconstructed leaf features.-> B
    D --> G["Resolution-Adaptive Rendering"]

Key Designs¶

1. FiLM Resolution Embedding: Writing "Target Resolution" Continuously into Anchor Features

Anchors trained at a fixed resolution lack information about the current rendering scale, leading to mismatch. This method uses Feature-wise Linear Modulation (FiLM) to inject resolution as a condition. Specifically, for an anchor at position \(x_v\) with a scaling ratio \(\xi\), position encoding and a small MLP produce a pair of FiLM parameters:

\[[\gamma_v, \beta_v] = \mathrm{MLP}\big(\mathrm{PE}(x_v, \xi)\big)\]

Then, an element-wise affine modulation is applied to the original anchor feature \(f_v\):

\[\hat f_v = (1 + \gamma_v) \odot f_v + \beta_v\]

The identity initialization of \((1+\gamma)\) allows the network to preserve original features initially and learn resolution-related offsets. The modulated \(\hat f_v\) is then passed to the Gaussian MLP. Through resolution-dependent \((\gamma_v, \beta_v)\), the same opacity threshold can automatically select different Gaussian subsets for different resolutions.

2. Pixel Coverage Gate (PCG): Aligning Projected Area with Pixel Sampling Rate

Anti-aliasing ensures that projected 2D Gaussian coverage matches pixel size. Gaussians smaller than a pixel cause aliasing and flickering. PCG applies a soft gate to the coverage area. The projected 2D Gaussian coverage at resolution \(\xi\) is calculated as:

\[A_{\xi,v} = \pi \rho^2 \sqrt{\det\big(\Sigma^{2D}_{\xi,v}\big)}\]

The method uses a sigmoid gate:

\[g_{pix}(A_{\xi,v}) = \frac{1}{1 + \exp\big(-\kappa(A_{\xi,v}-\eta)\big)\]

\(\eta\) is the tunable coverage threshold, and \(\kappa\) controls the steepness. When \(A_{\xi,v} > \eta\), the gate approaches 1; otherwise, it approaches 0. The final effective opacity is \(\tilde\alpha_{\xi,v} = \alpha_{\xi,v}\cdot g_{pix}(A_{\xi,v})\). Joint training is crucial here, as it allows the model to learn a resolution-aware anti-aliasing prior.

3. Density-Aware Proxy Anchor + Residual Predictor: Online Reconstruction

To further compress storage, the method partitions anchors into grids and selects approximately 30% within each cell as proxy anchors based on density. The residual anchor predictor (an MLP) reconstructs leaf features \(\hat f^q_i\) from the nearest proxy \(\hat f^k_i\) using distance \(\Delta x_i\) and direction \(\vec d_i\):

\[\tilde f^q_i = \hat f^k_i + \Delta f_i\]

During training, 20% of leaf anchors are sampled for the predictor. Since leaf features are reconstructed on-the-fly, storage requirements drop significantly (e.g., from 78.9MB to 41.6MB on Mip-NeRF360).

Loss & Training¶

Two-stage strategy. Stage 1 uses the standard 3D-GS rendering loss: \(L_{base} = (1-\lambda_{SSIM})L_{L1} + \lambda_{SSIM}L_{SSIM}\). Growing and pruning occur here. Stage 2 introduces the predictor and a reconstruction loss:

\[L_{recon} = \frac{1}{|\tilde F^\in_q|}\sum_i \Big(\|\tilde f^q_i - \hat f^q_i\|_2^2 + 0.1\cdot[1-\cos(\tilde f^q_i, \hat f^q_i)]\Big)\]

Total Stage 2 loss is \(L_{Stage2} = L_{base} + \lambda_{recon}L_{recon}\). Training involves random scaling (10%–100%) and 30k iterations.

Key Experimental Results¶

Main Results¶

Evaluation across 21 scenes (Mip-NeRF360, Tanks&Temples, DeepBlending, NeRF-Synthetic). All methods were trained with 10%–100% resolution sampling.

Dataset	Resolution	Metric	3D-GS	Scaffold-GS	Mip-Splat	Ours
Mip-NeRF360	10%	PSNR	21.781	24.681	28.044	29.742
Mip-NeRF360	100%	PSNR	26.471	26.498	26.542	26.710
Mip-NeRF360	—	MEM	220.9MB	78.9MB	185.8MB	41.6MB
DeepBlending	10%	PSNR	28.747	27.185	28.262	30.472
DeepBlending	100%	PSNR	27.613	29.276	27.861	29.332
DeepBlending	—	MEM	162.1MB	56.2MB	135.7MB	31.1MB

The most significant gains occur at low resolutions (10%), where PSNR increases by ~5 dB compared to Scaffold-GS, while utilizing only half the memory.

Ablation Study¶

Config	Mip-NeRF360 Avg PSNR	Description
No \(E_\xi\)	27.199	No resolution injection
+ \(E_\xi\)	28.036	Simple addition of embedding
+ FiLM\(_\xi\)	28.183	Most stable across resolutions

Config (DeepBlending)	PSNR	SSIM	LPIPS	Description
Scaffold-GS	29.465	0.886	0.201	Baseline
Scaffold-GS + PCG (post-hoc)	28.147	0.837	0.279	Post-hoc application drops quality
Ours (Joint PCG)	29.905	0.906	0.194	PCG requires joint training

Key Findings¶

PCG requires joint training: Applying PCG post-hoc degrades PSNR by ~1.3 dB; its value lies in the resolution-aware anti-aliasing prior learned during training.
Dynamic Adaptivity: The number of active Gaussians scales monotonically with resolution (fewer at low, more at high).
FiLM > Addition: Modulation handles both over-smoothing at low resolutions and under-expression at high resolutions better.
Proxy/Leaf Synergy: Memory decreases while quality increases because the predictor forces proxy anchors to encode richer information.

Highlights & Insights¶

Resolution as a Condition: Using FiLM to modulate a single set of anchors for scale-adaptivity avoids the memory overhead of multi-scale models.
Geometric PCG Formulation: The coverage formula \(A=\pi\rho^2\sqrt{\det(\Sigma_{2D})}\) provides a clear physical link between Gaussian size and sampling rate.
Representation Compression: Adopting the proxy-residual reconstruction from PointMAE optimizes storage by converting "store all" into "store representatives + online inference."
Experimental Rigor: The post-hoc vs. joint training comparison proves that the system requires end-to-end learning to be effective.

Limitations & Future Work¶

Online reconstruction of leaf anchors introduces additional inference computation overhead.
Fixed ratios (30% proxy, 20% leaf sampling) haven't been fully explored regarding sensitivity to scene complexity.
Performance in super-resolution scenarios (>100% scaling) remains unverified.

vs. Mip-Splatting: Mip-Splatting uses integrated filtering; this method adapts the generator (anchors) and utilizes significantly less memory (41.6MB vs 185.8MB).
vs. Multi-scale 3D-GS: Avoids storing multiple discrete layers of Gaussians.
vs. Context-GS / TC-GS: These focus on anchor structure compression; this method maximizes single anchor expressivity via the predictor.

Rating¶

Novelty: ⭐⭐⭐⭐
Experimental Thoroughness: ⭐⭐⭐⭐
Writing Quality: ⭐⭐⭐⭐
Value: ⭐⭐⭐⭐ (High utility for AR/VR deployment).