Skip to content

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Background & Motivation

3D Gaussian Splatting (3DGS) has achieved remarkable success in novel view synthesis, but its core EWA splatting pipeline has fundamental limitations. EWA splatting requires projecting 3D Gaussians onto a 2D plane, a process that relies on affine approximation (first-order Taylor expansion), which assumes the projection is locally linear within the Gaussian's support. This assumption is severely violated in the following scenarios:

Distorted Camera Models: Non-pinhole camera models such as fisheye lenses and rolling shutters introduce non-linear distortions, where the affine approximation leads to significant artifacts.

Secondary Rays: Effects like reflection and refraction require tracing secondary rays. The directions of these rays differ from the original projection direction, making them impossible to describe with a single affine transformation.

Wide-angle Lenses: When the field of view exceeds 120°, perspective distortion in the peripheral areas causes the linear approximation error to increase drastically.

Existing methods attempt to address these issues through post-processing or scene-specific patching, but they lack a unified mathematical framework. This paper proposes using the Unscented Transform (UT) to replace the affine approximation in EWA, fundamentally resolving this limitation.

Method

Unscented Transform Principle

The Unscented Transform (UT) is a technique for propagating probability distributions through non-linear transformations. Its core idea is to capture the statistical properties of a distribution using a set of carefully selected sigma points, rather than linearizing the transformation function itself.

For an \(n\)-dimensional Gaussian distribution \(\mathcal{N}(oldsymbol{\mu}, oldsymbol{\Sigma})\), UT selects \(2n+1\) sigma points:

\[oldsymbol{\chi}_0 = oldsymbol{\mu}, \quad oldsymbol{\chi}_i = oldsymbol{\mu} + \sqrt{(n+\lambda)} \cdot oldsymbol{L}_i, \quad oldsymbol{\chi}_{n+i} = oldsymbol{\mu} - \sqrt{(n+\lambda)} \cdot oldsymbol{L}_i\]

where \(oldsymbol{L}\) is the Cholesky decomposition of \(oldsymbol{\Sigma}\), and \(\lambda = lpha^2(n+\kappa) - n\) is the scaling parameter.

3DGUT Splatting Pipeline

This paper integrates UT into the splatting pipeline of 3DGS, forming 3DGUT:

Step EWA Splatting 3DGUT (Ours)
Projection Method Affine Approximation (Jacobian Matrix) Unscented Transform (sigma points)
Camera Model Pinhole Only Any Differentiable Camera Model
Secondary Rays Unsupported Native Support
Computational Overhead Low Slightly Higher (7 sigma points vs 1 matrix multiplication)
Accuracy First-order Approximation Second-order Accuracy

Sigma Points Generation

For each 3D Gaussian \(\mathcal{G}(oldsymbol{\mu}_{3D}, oldsymbol{\Sigma}_{3D})\), 7 sigma points are generated (\(n=3\) in 3D space): - Center point \(oldsymbol{\chi}_0 = oldsymbol{\mu}_{3D}\) - 6 sample points along the directions of the principal axes of the covariance matrix

Non-linear Projection

All sigma points are food through the complete non-linear camera projection function \(\mathbf{h}(\cdot)\):

\[oldsymbol{\mathcal{Y}}_i = \mathbf{h}(oldsymbol{\chi}_i)\]

Here, \(\mathbf{h}\) can be any differentiable projection function, including fisheye models with distortion coefficients or rolling shutter models.

2D Gaussian Recovery

The 2D Gaussian parameters are recovered from the projected sigma points:

\[oldsymbol{\mu}_{2D} = \sum_i w_i^{(m)} oldsymbol{\mathcal{Y}}_i, \quad oldsymbol{\Sigma}_{2D} = \sum_i w_i^{(c)} (oldsymbol{\mathcal{Y}}_i - oldsymbol{\mu}_{2D})(oldsymbol{\mathcal{Y}}_i - oldsymbol{\mu}_{2D})^T\]

Rolling Shutter Support

Each row of pixels in a rolling shutter camera is exposed at different times, causing the jelly effect in moving objects. 3DGUT incorporates the temporal dimension into the projection function:

\[\mathbf{h}_{RS}(oldsymbol{x}, t) = \pi(oldsymbol{T}(t) \cdot oldsymbol{x})\]

where \(oldsymbol{T}(t)\) is the time-varying camera pose. UT can naturally handle this space-time coupled non-linear projection.

Secondary Ray Tracing

For reflection and refraction effects, this method attaches normal information to each Gaussian, computes the reflection/refraction direction via ray-Gaussian intersection, and generates secondary rays. The secondary rays interact with the Gaussians in the scene again, a process to which UT is equally applicable.

Experimental Results

Standard Scene Comparison

Method Mip-NeRF360 PSNR↑ SSIM↑ LPIPS↓
3DGS (Original) 27.21 0.815 0.214
Mip-Splatting 27.79 0.827 0.203
3DGUT (Pinhole) 27.83 0.829 0.199
3DGUT (Distorted) 28.14 0.838 0.188

Rolling Shutter Scenes

On the TUM-RS and Unreal-RS datasets, 3DGUT significantly outperforms baseline methods that ignore the shutter effect when processing rolling shutter scenes, achieving a PSNR improvement of 2-4 dB.

Reflection/Refraction Scenes

In synthetic scenes containing specular reflection and glass refraction, 3DGUT is the first method capable of directly processing secondary ray effects within the 3DGS framework, avoiding complex designs of the past that required additional environment maps or multi-layer representations.

Ablation Study

Component PSNR Description
Full 3DGUT 28.14 Full Model
w/o UT (Fallback to EWA) 27.21 Degrades to standard 3DGS
UT + Pinhole 27.83 UT only, without distortion
UT + Distorted 28.14 Full distortion model

Summary & Future Work

By introducing the Unscented Transform to replace the traditional affine approximation in EWA splatting, 3DGUT provides a unified and elegant framework to handle distorted camera models and secondary ray effects. The core advantage of this method is that it requires no linearization assumptions about the projection function, as long as the projection function itself is differentiable. Although the computational overhead is slightly increased (approximately 1.3×), the resulting improvements in flexibility and accuracy offer significant value in practical applications.