GLINT: Modeling Scene-Scale Transparency via Gaussian Radiance Transport¶

Conference: CVPR 2026 arXiv: 2603.26181 Code: https://youngju-na.github.io/GLINT Area: 3D Vision Keywords: Gaussian splatting, transparent surface reconstruction, radiance transport decomposition, hybrid rendering, scene reconstruction

TL;DR¶

GLINT decomposes Gaussian representations into three components — interface, transmission, and reflection — and couples them with a hybrid rasterization+ray-tracing rendering pipeline, achieving state-of-the-art geometry and appearance reconstruction for scene-scale transparent surfaces such as glass walls and display cases.

Background & Motivation¶

Background: 3D Gaussian Splatting (3DGS) has become the dominant paradigm for 3D reconstruction, owing to its real-time rendering capability and high visual fidelity. Subsequent works have improved geometric accuracy (e.g., 2DGS, PGSR with planar Gaussian constraints) and non-Lambertian appearance modeling (e.g., GaussianShader, EnvGS with reflection decomposition).

Limitations of Prior Work: The core alpha-blending mechanism of 3DGS is fundamentally incapable of handling transparent surfaces. In transparent scenes — such as architectural glass, display cabinets, and windows — a single pixel receives superimposed radiance from both reflected and transmitted components originating at different physical locations. To render realistic transmission, Gaussians positioned at glass surfaces must be assigned very low opacity or be pruned, which causes the loss of geometric information for the glass. Conversely, high opacity treats the glass as an opaque occluder, blocking background transmission. This constitutes the "transparency–depth dilemma."

Key Challenge: Alpha-blending couples geometry and appearance within a single compositing stream — the same opacity parameter simultaneously governs geometric presence and radiance contribution, which is fundamentally contradictory for transparent objects (geometrically present, yet optically transmissive).

Goal: (1) How to decouple the geometry and appearance of transparent surfaces within the Gaussian splatting framework? (2) How to jointly reconstruct transmitted and reflected radiance without manual segmentation masks? (3) How to scale to scene-level complex transparent structures?

Key Insight: Existing methods are either object-centric and require segmentation masks (TransparentGS, TSGS), or handle only reflection without transmission (EnvGS, DeferredGS). The authors propose explicitly decomposing the scene into three functional sets: the first-visible interface, transmission geometry, and reflective environment, each represented and optimized by an independent Gaussian set.

Core Idea: Decompose the Gaussian splatting representation into interface/transmission/reflection components, and implement physically consistent transparent radiance transport via a hybrid rasterization+ray-tracing pipeline.

Method¶

Overall Architecture¶

GLINT takes multi-view images as input and outputs a complete 3D scene reconstruction (geometry and appearance) containing transparent surfaces. Scene Gaussians are explicitly partitioned into three groups: interface Gaussians \(\mathcal{G}_{\text{intr}}\) (capturing the first-visible surface, including both transparent and opaque boundaries), transmission Gaussians \(\mathcal{G}_{\text{trans}}\) (modeling background geometry visible through transparent surfaces), and reflection Gaussians \(\mathcal{G}_{\text{refl}}\) (encoding environment radiance reflected from the interface). The interface component produces a G-buffer (depth, normal, transparency, specularity) via rasterization, after which ray tracing queries the transmission and reflection components; the outgoing radiance is finally composited by weighting with transparency and Fresnel reflectance.

Key Designs¶

Decomposed Gaussian Representation:
- Function: Decouple multi-path radiance in transparent scenes and model the interface, transmission, and reflection separately.
- Mechanism: The interface component \(\mathcal{G}_{\text{intr}}\) produces a G-buffer \(\mathcal{B} = \{z, \mathbf{n}, t, s\}\) (depth, normal, transparency \(t \in [0,1]\), specularity \(s \in [0,1]\)) via 2DGS rasterization. The transparency \(t\) determines whether a point follows the opaque or transparent path, while the specularity \(s\) controls the ratio of diffuse to specular reflection. The transmission component \(\mathcal{G}_{\text{trans}}\) and the reflection component \(\mathcal{G}_{\text{refl}}\) are optimized independently, thereby avoiding the coupling of geometry and appearance inherent in conventional alpha-blending.
- Design Motivation: Standard alpha-blending introduces a fundamental conflict for transparent objects. By explicit decomposition, each component need only attend to its own optical role — the interface governs geometry, transmission governs background visibility, and reflection governs environment mapping.
Transparency-Aware Radiance Transport:
- Function: Physically consistent compositing of outgoing radiance for transparent/opaque scenes.
- Mechanism: The outgoing radiance is \(L_o = (1-t) L_{\text{opaque}} + t L_{\text{transparent}}\). The opaque branch applies the Schlick Fresnel approximation \(F(\omega_o) = F_0 + (1-F_0)(1 - \max(0, \omega_o \cdot \mathbf{n}))^5\), blending interface base color and reflected radiance as \(L_{\text{opaque}} = (1-k_s) L_{\text{intr}} + k_s L_{\text{refl}}\), where \(k_s = s + (1-s) F(\omega_o)\). The transparent branch is analogous but replaces diffuse with transmission: \(L_{\text{transparent}} = (1-k_s) L_{\text{trans}} + k_s L_{\text{refl}}\). An optically thin assumption (\(\omega_t \approx \omega_o\)) is adopted, treating refractive bending as negligible. Transmitted and reflected radiance are queried from their respective Gaussian components via ray tracing: \(L_{\text{refl}} = \text{Trace}(\mathcal{G}_{\text{refl}}, \mathbf{x}, \omega_r)\), \(L_{\text{trans}} = \text{Trace}(\mathcal{G}_{\text{trans}}, \mathbf{x}, \omega_t)\).
- Design Motivation: This BSDF-inspired decomposition explicitly separates the reflection and transmission paths based on surface properties, eliminating the need for physically inconsistent compromises on opacity during optimization.
Transparency Bootstrapping & Geometric Priors:
- Function: Localize transparent regions without manual segmentation masks and stabilize the optimization process.
- Mechanism: Transparency bootstrapping exploits signals that emerge naturally during decomposed representation optimization: (a) the interface–transmission depth discrepancy \(\Delta z = |z_{\text{intr}} - z_{\text{trans}}|\) — a large depth gap indicates multiple depth layers, suggesting glass; (b) the diffuse albedo map \(\hat{a}\) predicted by a pretrained video relighting model — low albedo indicates specular-dominant transport. A binary transparency mask \(M_{\text{trans}} = \mathbf{1}((\Delta z > \tau_d) \land (\hat{a} < \gamma_a))\) supervises the predicted transparency \(t\) via an L1 loss. Geometric regularization uses depth \(\hat{z}\) and normals \(\hat{\mathbf{n}}\) predicted by the encoder of a pretrained video relighting model, stabilizing interface geometry through scale-invariant depth loss and normal angular loss.
- Design Motivation: Existing segmentation modules frequently fail on scene-scale transparency due to blurry boundaries and overlapping transmitted radiance. The depth-discrepancy signal arising naturally from decomposed optimization provides a more reliable cue for transparency localization. Priors from the video relighting model are more temporally consistent across frames than monocular depth estimates.

Loss & Training¶

Total loss: \(\mathcal{L}_{\text{photo}} = \lambda_1 \mathcal{L}_1 + \lambda_{\text{ssim}} \mathcal{L}_{\text{SSIM}} + \lambda_{\text{lpips}} \mathcal{L}_{\text{LPIPS}}\) (photometric reconstruction) + \(\mathcal{L}_{\text{geo}} = \lambda_d \mathcal{L}_{\text{depth}} + \lambda_n \mathcal{L}_{\text{normal}}\) (geometric regularization) + \(\mathcal{L}_{\text{trans}} = \lambda_t \|M_{\text{trans}} - t\|_1\) (transparency supervision). The system employs a 2DGS rasterizer combined with a modified OptiX ray tracer, with adaptive densification and pruning alongside edge-aware normal smoothing. Thresholds are set to \(\tau_d = 0.01\) and \(\gamma_a = 0.05\). Training is performed on a single RTX 4090.

Key Experimental Results¶

Main Results — Synthetic Dataset 3D-FRONT-T (Geometry Evaluation)¶

Method	Normal MAE↓	11.25°↑	Depth AbsRel↓	CD↓	F1↑
2DGS	25.97	52.19	0.20	0.85	0.688
EnvGS	14.37	68.22	0.13	0.87	0.640
TSGS	9.89	86.29	0.08	0.52	0.798
GLINT	7.96	86.37	0.04	0.34	0.836

Main Results — Rendering Quality¶

Method	DL3DV-10K PSNR↑	DL3DV-10K SSIM↑	3D-FRONT-T PSNR↑
EnvGS	29.65	0.91	33.71
TSGS	25.94	0.85	28.80
GLINT	30.21	0.92	34.50

Ablation Study¶

Configuration	PSNR↑	MAE↓	AbsRel↓
Full model	34.50	7.96	0.035
w/o \(\mathcal{G}_{\text{trans}}\)	32.26	8.11	0.038
w/o \(\mathcal{G}_{\text{refl}}\)	32.70	8.78	0.038
w/o \(\mathcal{L}_{\text{trans}}\)	33.57	8.07	0.037
w/o \(\mathcal{L}_{\text{geo}}\)	33.62	24.69	0.126

Key Findings¶

Removing the transmission component \(\mathcal{G}_{\text{trans}}\) causes the largest performance drop (PSNR −2.24), as background content is incorrectly absorbed into the interface Gaussians, introducing geometric ambiguity.
Removing the geometric regularization \(\mathcal{L}_{\text{geo}}\) causes normal MAE to surge from 7.96 to 24.69 and depth AbsRel from 0.035 to 0.126, demonstrating that the priors are critical for stabilizing geometry in transparent regions.
Although TSGS achieves reasonable geometric results, its rendering quality is substantially inferior to GLINT (PSNR gap of 5+ dB), as it models only the first surface and cannot recover transmitted radiance.
Transparency bootstrapping successfully identifies glass regions across diverse scenes without any manual annotation.

Highlights & Insights¶

The three-component decomposition design directly and elegantly resolves the "transparency–depth dilemma." By explicitly separating optical roles into interface/transmission/reflection, the opacity of each component is no longer subject to contradictory constraints. This structured decomposition paradigm is generalizable to other optical phenomena such as translucency and subsurface scattering.
The transparency bootstrapping approach is notably elegant: it leverages depth discrepancy as the primary signal and albedo as an auxiliary signal to detect transparent regions — both arising entirely as byproducts of the decomposed optimization, requiring no additional segmentation model.
The hybrid rendering design (rasterization + ray tracing) strikes a practical balance between efficiency and physical correctness. Rasterization handles the interface efficiently, while ray tracing handles secondary reflection/transmission paths — a sound engineering trade-off.

Limitations & Future Work¶

The method relies on an optically thin assumption, i.e., refractive bending is negligible. This assumption breaks down for scenes with significant refraction, such as thick glass or aquariums.
A pretrained video relighting model is required as a prior, increasing external dependencies and deployment complexity.
The computational overhead of ray tracing limits the feasibility of real-time applications (though tractable on an RTX 4090).
Evaluation is conducted on only 5 synthetic and 8 real scenes; broader scene diversity remains to be explored.
The transparency mask thresholds \(\tau_d\) and \(\gamma_a\) are fixed across all scenes and may be suboptimal in general settings.

vs. 2DGS/PGSR: These methods improve Gaussian geometric accuracy but are entirely incapable of handling transparent surfaces, producing missing or noisy normals and depth in glass regions.
vs. EnvGS: This method specifically models reflection via independent environment Gaussians and ray tracing, but does not address transmitted radiance, leading to suboptimal rendering and geometry in transparent scenes.
vs. TSGS: This method specifically models transparent surfaces via first-surface rasterization, but handles only the first surface without transmission decomposition, resulting in blurry rendering of objects seen through glass. GLINT addresses both geometry and appearance simultaneously.
vs. TransparentGS: This method handles refractive transparency through volumetric modeling of refractive media, but is object-centric and requires segmentation masks. GLINT operates at scene scale without requiring masks.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The three-component decomposition coupled with a hybrid rendering pipeline constitutes a fundamental advance in transparent scene reconstruction.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers both synthetic and real datasets with complete ablations, though the number of evaluated scenes is limited.
Writing Quality: ⭐⭐⭐⭐⭐ Physical modeling is rigorous, and figures are rich and intuitive.
Value: ⭐⭐⭐⭐⭐ Addresses a fundamental limitation of 3DGS; the introduced 3D-FRONT-T benchmark is of significant value to future work.