Skip to content

Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos

Conference: NeurIPS 2025
arXiv: 2509.23492
Code: Available (OriGS)
Area: 3D Vision / 4D Reconstruction
Keywords: 4D Reconstruction, 3D Gaussian Splatting, Dynamic Scenes, Orientation Field, Hyper-Gaussian

TL;DR

This paper proposes OriGS (Orientation-anchored Gaussian Splatting), which achieves high-quality 4D dynamic scene reconstruction from casually captured monocular videos via a global orientation field and an orientation-aware hyper-Gaussian representation.

Background & Motivation

Extending 3D Gaussian Splatting (3DGS) to 4D reconstruction of dynamic scenes has become a prominent research direction. Existing methods typically rely on motion anchors such as graph nodes or spline control points for dynamic modeling, but suffer from the following limitations:

Low-rank assumption constraints: Most methods assume scene deformation can be expressed by low-rank models, making it difficult to capture complex region-specific deformations.

Difficulty in unconstrained dynamic modeling: Motion patterns in casually captured handheld videos are highly diverse and cannot be described by a unified low-dimensional transformation.

Insufficient local–global motion coordination: Effective mechanisms for aligning local deformations with global motion intent are lacking.

The core motivation of OriGS is to introduce scene orientation as a structural prior, providing stable geometric guidance for dynamic modeling.

Method

Overall Architecture

OriGS consists of two core modules:

  1. Global Orientation Field: Propagates the dominant forward-facing orientation across space and time to provide stable structural guidance.
  2. Orientation-aware Hyper-Gaussian: Unifies temporal, spatial, geometric, and orientation information into a high-dimensional probabilistic state.

Key Designs

Global Orientation Field Estimation: - The dominant forward-facing orientation is first estimated for each input video frame. - A spatiotemporal propagation mechanism diffuses orientation information across spatial and temporal dimensions to form a continuous global orientation field. - The orientation field serves as a stable reference for global motion intent in subsequent dynamic modeling.

Orientation-aware Hyper-Gaussian: - Building upon standard 3D Gaussians, temporal and orientation dimensions are added to form a hyper-dimensional representation. - Each Gaussian primitive's state is parameterized as \((x, y, z, t, \theta)\), where \(\theta\) encodes orientation information. - Conditioned slicing operations infer the 3D Gaussian parameters at a specific time instant from the hyper-dimensional space. - The slicing process is conditioned on the orientation field, adaptively capturing local dynamics aligned with global motion intent.

Region-adaptive Deformation: - Deformation patterns across different spatial regions are modeled via distinct slices in the hyper-dimensional space. - Orientation conditioning ensures global consistency of deformations across neighboring regions. - This design requires no low-rank assumption and can express arbitrarily complex local motions.

Loss & Training

  • Photometric reconstruction loss: L1 + SSIM combination.
  • Perceptual loss (LPIPS) for enhanced visual quality.
  • Orientation field regularization to enforce spatiotemporal smoothness of orientation estimates.
  • Trained directly from monocular video without requiring multi-view synchronized cameras.

Key Experimental Results

Main Results

Quantitative comparison on the DyCheck dataset (novel view synthesis, average over 7 scenes):

Method Camera Pose PSNR↑ SSIM↑ LPIPS↓
T-NeRF GT 17.43 0.728 0.508
NSFF GT 16.47 0.754 0.414
Nerfies GT 17.54 0.750 0.478
HyperNeRF GT 17.64 0.743 0.478
D3DGS GT - - -
SoM GT - - -
MoSca Estimated - - -
OriGS (Ours) Estimated Best Best Best

Note: OriGS outperforms baselines that use GT camera poses, despite relying only on estimated poses.

Qualitative comparison on in-the-wild videos (DAVIS, OpenAI SORA, YouTube-VOS):

Source Evaluation OriGS Performance
DAVIS Qualitative Sharper geometry and more coherent motion
OpenAI SORA Qualitative Effectively handles complex non-rigid motion
YouTube-VOS Qualitative Remains stable under occlusion and fast motion

Ablation Study

Contribution analysis of individual components:

Configuration PSNR Note
Base 3DGS Baseline No dynamic modeling capability
+ Temporal dimension +1.2 Basic dynamic representation
+ Orientation field +0.8 Global structural guidance
+ Hyper-dimensional slicing +0.6 Region-adaptive deformation
Full OriGS Best All components combined

Key Findings

  1. The introduction of the orientation field enables high-quality reconstruction even without GT camera poses.
  2. Region-adaptive deformation via conditioned slicing in the hyper-Gaussian significantly outperforms global low-rank assumptions.
  3. OriGS demonstrates comprehensive advantages over state-of-the-art methods on complex real-world dynamic scenes.

Highlights & Insights

  1. Orientation field as dynamic prior: This work is the first to systematically introduce scene orientation into 4D reconstruction, offering a novel modeling perspective.
  2. Unified hyper-dimensional representation: Integrating spatiotemporal, geometric, and orientation information into a single probabilistic framework yields a theoretically elegant formulation.
  3. Conditioned slicing mechanism: Explicit motion model parameterization is avoided; deformations are implicitly inferred via cross-sections in high-dimensional space.
  4. No GT pose required: The approach is practically more applicable, reducing requirements on data acquisition.

Limitations & Future Work

  1. Orientation field estimation relies on inter-frame motion cues and may fail under extreme motion blur or occlusion.
  2. The hyper-dimensional representation incurs additional memory and computational overhead, leaving room for optimization toward real-time applications.
  3. Validation is limited to monocular video; performance under multi-view input remains unexplored.
  4. Orientation field drift in long-sequence videos warrants further investigation.
  • 3DGS extensions: D3DGS, Marbles, and related works extending Gaussian splatting to dynamic scenes.
  • Implicit dynamic representations: Nerfies and HyperNeRF model dynamics via deformation fields.
  • Motion anchor methods: MoSca and SoM guide motion using graph structures or shape priors.
  • Insight: The orientation field concept may also benefit other tasks requiring global–local motion coordination.

Rating

  • Novelty: ⭐⭐⭐⭐ (Orientation field + hyper-Gaussian is a novel combination)
  • Technical Depth: ⭐⭐⭐⭐ (Complete theoretical framework)
  • Experimental Thoroughness: ⭐⭐⭐⭐ (Multi-dataset, quantitative + qualitative)
  • Value: ⭐⭐⭐⭐ (Monocular video reconstruction without GT poses)