Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos¶

Conference: NeurIPS 2025
arXiv: 2509.23492
Code: Available (OriGS)
Area: 3D Vision / 4D Reconstruction
Keywords: 4D Reconstruction, 3D Gaussian Splatting, Dynamic Scenes, Orientation Field, Hyper-Gaussian

TL;DR¶

This paper proposes OriGS (Orientation-anchored Gaussian Splatting), which achieves high-quality 4D dynamic scene reconstruction from casually captured monocular videos via a global orientation field and an orientation-aware hyper-Gaussian representation.

Background & Motivation¶

Extending 3D Gaussian Splatting (3DGS) to 4D reconstruction of dynamic scenes has become a prominent research direction. Existing methods typically rely on motion anchors such as graph nodes or spline control points for dynamic modeling, but suffer from the following limitations:

Low-rank assumption constraints: Most methods assume scene deformation can be expressed by low-rank models, making it difficult to capture complex region-specific deformations.

Difficulty in unconstrained dynamic modeling: Motion patterns in casually captured handheld videos are highly diverse and cannot be described by a unified low-dimensional transformation.

Insufficient local–global motion coordination: Effective mechanisms for aligning local deformations with global motion intent are lacking.

The core motivation of OriGS is to introduce scene orientation as a structural prior, providing stable geometric guidance for dynamic modeling.

Method¶

Overall Architecture¶

OriGS consists of two core modules:

Global Orientation Field: Propagates the dominant forward-facing orientation across space and time to provide stable structural guidance.
Orientation-aware Hyper-Gaussian: Unifies temporal, spatial, geometric, and orientation information into a high-dimensional probabilistic state.

Key Designs¶

Global Orientation Field Estimation: - The dominant forward-facing orientation is first estimated for each input video frame. - A spatiotemporal propagation mechanism diffuses orientation information across spatial and temporal dimensions to form a continuous global orientation field. - The orientation field serves as a stable reference for global motion intent in subsequent dynamic modeling.

Orientation-aware Hyper-Gaussian: - Building upon standard 3D Gaussians, temporal and orientation dimensions are added to form a hyper-dimensional representation. - Each Gaussian primitive's state is parameterized as \((x, y, z, t, \theta)\), where \(\theta\) encodes orientation information. - Conditioned slicing operations infer the 3D Gaussian parameters at a specific time instant from the hyper-dimensional space. - The slicing process is conditioned on the orientation field, adaptively capturing local dynamics aligned with global motion intent.

Region-adaptive Deformation: - Deformation patterns across different spatial regions are modeled via distinct slices in the hyper-dimensional space. - Orientation conditioning ensures global consistency of deformations across neighboring regions. - This design requires no low-rank assumption and can express arbitrarily complex local motions.

Loss & Training¶

Photometric reconstruction loss: L1 + SSIM combination.
Perceptual loss (LPIPS) for enhanced visual quality.
Orientation field regularization to enforce spatiotemporal smoothness of orientation estimates.
Trained directly from monocular video without requiring multi-view synchronized cameras.

Key Experimental Results¶

Main Results¶

Quantitative comparison on the DyCheck dataset (novel view synthesis, average over 7 scenes):

Method	Camera Pose	PSNR↑	SSIM↑	LPIPS↓
T-NeRF	GT	17.43	0.728	0.508
NSFF	GT	16.47	0.754	0.414
Nerfies	GT	17.54	0.750	0.478
HyperNeRF	GT	17.64	0.743	0.478
D3DGS	GT	-	-	-
SoM	GT	-	-	-
MoSca	Estimated	-	-	-
OriGS (Ours)	Estimated	Best	Best	Best

Note: OriGS outperforms baselines that use GT camera poses, despite relying only on estimated poses.

Qualitative comparison on in-the-wild videos (DAVIS, OpenAI SORA, YouTube-VOS):

Source	Evaluation	OriGS Performance
DAVIS	Qualitative	Sharper geometry and more coherent motion
OpenAI SORA	Qualitative	Effectively handles complex non-rigid motion
YouTube-VOS	Qualitative	Remains stable under occlusion and fast motion

Ablation Study¶

Contribution analysis of individual components:

Configuration	PSNR	Note
Base 3DGS	Baseline	No dynamic modeling capability
+ Temporal dimension	+1.2	Basic dynamic representation
+ Orientation field	+0.8	Global structural guidance
+ Hyper-dimensional slicing	+0.6	Region-adaptive deformation
Full OriGS	Best	All components combined

Key Findings¶

The introduction of the orientation field enables high-quality reconstruction even without GT camera poses.
Region-adaptive deformation via conditioned slicing in the hyper-Gaussian significantly outperforms global low-rank assumptions.
OriGS demonstrates comprehensive advantages over state-of-the-art methods on complex real-world dynamic scenes.

Highlights & Insights¶

Orientation field as dynamic prior: This work is the first to systematically introduce scene orientation into 4D reconstruction, offering a novel modeling perspective.
Unified hyper-dimensional representation: Integrating spatiotemporal, geometric, and orientation information into a single probabilistic framework yields a theoretically elegant formulation.
Conditioned slicing mechanism: Explicit motion model parameterization is avoided; deformations are implicitly inferred via cross-sections in high-dimensional space.
No GT pose required: The approach is practically more applicable, reducing requirements on data acquisition.

Limitations & Future Work¶

Orientation field estimation relies on inter-frame motion cues and may fail under extreme motion blur or occlusion.
The hyper-dimensional representation incurs additional memory and computational overhead, leaving room for optimization toward real-time applications.
Validation is limited to monocular video; performance under multi-view input remains unexplored.
Orientation field drift in long-sequence videos warrants further investigation.

3DGS extensions: D3DGS, Marbles, and related works extending Gaussian splatting to dynamic scenes.
Implicit dynamic representations: Nerfies and HyperNeRF model dynamics via deformation fields.
Motion anchor methods: MoSca and SoM guide motion using graph structures or shape priors.
Insight: The orientation field concept may also benefit other tasks requiring global–local motion coordination.

Rating¶

Novelty: ⭐⭐⭐⭐ (Orientation field + hyper-Gaussian is a novel combination)
Technical Depth: ⭐⭐⭐⭐ (Complete theoretical framework)
Experimental Thoroughness: ⭐⭐⭐⭐ (Multi-dataset, quantitative + qualitative)
Value: ⭐⭐⭐⭐ (Monocular video reconstruction without GT poses)