Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos¶
Conference: NeurIPS 2025
arXiv: 2509.23492
Code: Available (OriGS)
Area: 3D Vision / 4D Reconstruction
Keywords: 4D Reconstruction, 3D Gaussian Splatting, Dynamic Scenes, Orientation Field, Hyper-Gaussian
TL;DR¶
This paper proposes OriGS (Orientation-anchored Gaussian Splatting), which achieves high-quality 4D dynamic scene reconstruction from casually captured monocular videos via a global orientation field and an orientation-aware hyper-Gaussian representation.
Background & Motivation¶
Extending 3D Gaussian Splatting (3DGS) to 4D reconstruction of dynamic scenes has become a prominent research direction. Existing methods typically rely on motion anchors such as graph nodes or spline control points for dynamic modeling, but suffer from the following limitations:
Low-rank assumption constraints: Most methods assume scene deformation can be expressed by low-rank models, making it difficult to capture complex region-specific deformations.
Difficulty in unconstrained dynamic modeling: Motion patterns in casually captured handheld videos are highly diverse and cannot be described by a unified low-dimensional transformation.
Insufficient local–global motion coordination: Effective mechanisms for aligning local deformations with global motion intent are lacking.
The core motivation of OriGS is to introduce scene orientation as a structural prior, providing stable geometric guidance for dynamic modeling.
Method¶
Overall Architecture¶
OriGS consists of two core modules:
- Global Orientation Field: Propagates the dominant forward-facing orientation across space and time to provide stable structural guidance.
- Orientation-aware Hyper-Gaussian: Unifies temporal, spatial, geometric, and orientation information into a high-dimensional probabilistic state.
Key Designs¶
Global Orientation Field Estimation: - The dominant forward-facing orientation is first estimated for each input video frame. - A spatiotemporal propagation mechanism diffuses orientation information across spatial and temporal dimensions to form a continuous global orientation field. - The orientation field serves as a stable reference for global motion intent in subsequent dynamic modeling.
Orientation-aware Hyper-Gaussian: - Building upon standard 3D Gaussians, temporal and orientation dimensions are added to form a hyper-dimensional representation. - Each Gaussian primitive's state is parameterized as \((x, y, z, t, \theta)\), where \(\theta\) encodes orientation information. - Conditioned slicing operations infer the 3D Gaussian parameters at a specific time instant from the hyper-dimensional space. - The slicing process is conditioned on the orientation field, adaptively capturing local dynamics aligned with global motion intent.
Region-adaptive Deformation: - Deformation patterns across different spatial regions are modeled via distinct slices in the hyper-dimensional space. - Orientation conditioning ensures global consistency of deformations across neighboring regions. - This design requires no low-rank assumption and can express arbitrarily complex local motions.
Loss & Training¶
- Photometric reconstruction loss: L1 + SSIM combination.
- Perceptual loss (LPIPS) for enhanced visual quality.
- Orientation field regularization to enforce spatiotemporal smoothness of orientation estimates.
- Trained directly from monocular video without requiring multi-view synchronized cameras.
Key Experimental Results¶
Main Results¶
Quantitative comparison on the DyCheck dataset (novel view synthesis, average over 7 scenes):
| Method | Camera Pose | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|---|
| T-NeRF | GT | 17.43 | 0.728 | 0.508 |
| NSFF | GT | 16.47 | 0.754 | 0.414 |
| Nerfies | GT | 17.54 | 0.750 | 0.478 |
| HyperNeRF | GT | 17.64 | 0.743 | 0.478 |
| D3DGS | GT | - | - | - |
| SoM | GT | - | - | - |
| MoSca | Estimated | - | - | - |
| OriGS (Ours) | Estimated | Best | Best | Best |
Note: OriGS outperforms baselines that use GT camera poses, despite relying only on estimated poses.
Qualitative comparison on in-the-wild videos (DAVIS, OpenAI SORA, YouTube-VOS):
| Source | Evaluation | OriGS Performance |
|---|---|---|
| DAVIS | Qualitative | Sharper geometry and more coherent motion |
| OpenAI SORA | Qualitative | Effectively handles complex non-rigid motion |
| YouTube-VOS | Qualitative | Remains stable under occlusion and fast motion |
Ablation Study¶
Contribution analysis of individual components:
| Configuration | PSNR | Note |
|---|---|---|
| Base 3DGS | Baseline | No dynamic modeling capability |
| + Temporal dimension | +1.2 | Basic dynamic representation |
| + Orientation field | +0.8 | Global structural guidance |
| + Hyper-dimensional slicing | +0.6 | Region-adaptive deformation |
| Full OriGS | Best | All components combined |
Key Findings¶
- The introduction of the orientation field enables high-quality reconstruction even without GT camera poses.
- Region-adaptive deformation via conditioned slicing in the hyper-Gaussian significantly outperforms global low-rank assumptions.
- OriGS demonstrates comprehensive advantages over state-of-the-art methods on complex real-world dynamic scenes.
Highlights & Insights¶
- Orientation field as dynamic prior: This work is the first to systematically introduce scene orientation into 4D reconstruction, offering a novel modeling perspective.
- Unified hyper-dimensional representation: Integrating spatiotemporal, geometric, and orientation information into a single probabilistic framework yields a theoretically elegant formulation.
- Conditioned slicing mechanism: Explicit motion model parameterization is avoided; deformations are implicitly inferred via cross-sections in high-dimensional space.
- No GT pose required: The approach is practically more applicable, reducing requirements on data acquisition.
Limitations & Future Work¶
- Orientation field estimation relies on inter-frame motion cues and may fail under extreme motion blur or occlusion.
- The hyper-dimensional representation incurs additional memory and computational overhead, leaving room for optimization toward real-time applications.
- Validation is limited to monocular video; performance under multi-view input remains unexplored.
- Orientation field drift in long-sequence videos warrants further investigation.
Related Work & Insights¶
- 3DGS extensions: D3DGS, Marbles, and related works extending Gaussian splatting to dynamic scenes.
- Implicit dynamic representations: Nerfies and HyperNeRF model dynamics via deformation fields.
- Motion anchor methods: MoSca and SoM guide motion using graph structures or shape priors.
- Insight: The orientation field concept may also benefit other tasks requiring global–local motion coordination.
Rating¶
- Novelty: ⭐⭐⭐⭐ (Orientation field + hyper-Gaussian is a novel combination)
- Technical Depth: ⭐⭐⭐⭐ (Complete theoretical framework)
- Experimental Thoroughness: ⭐⭐⭐⭐ (Multi-dataset, quantitative + qualitative)
- Value: ⭐⭐⭐⭐ (Monocular video reconstruction without GT poses)