HyperGS: Hyperspectral 3D Gaussian Splatting¶
Conference: CVPR 2025
arXiv: 2412.12849
Code: Unreleased
Area: 3D Vision
Keywords: hyperspectral imaging, 3D Gaussian Splatting, novel view synthesis, latent space, spectral reconstruction
TL;DR¶
This paper successfully extends 3DGS to hyperspectral novel view synthesis (HNVS) for the first time. By performing hyperspectral rendering in a learned latent space combined with adaptive density control and pixel-level spectral pruning, it achieves efficient and accurate reconstruction of high-dimensional spectral data.
Background & Motivation¶
Background: 3DGS has achieved remarkable success in RGB novel view synthesis, enabling high-performance and high-quality real-time rendering. However, RGB imaging only captures three-channel information and fails to represent the spectral characteristics of materials. Hyperspectral imaging (HSI) captures continuous spectra across 128–141 narrow-band channels, which is crucial for fields such as remote sensing, medical diagnosis, and robotics.
Limitations of Prior Work: (1) NeRF-based HNVS methods like HS-NeRF suffer from unstable training and slow rendering speeds; (2) directly extending 3DGS to high-dimensional spectral data yields optimization challenges and difficulties in threshold settings under naive approaches; (3) there is a lack of systematic HNVS benchmarking.
Key Challenge: The extreme dimensionality of hyperspectral data (128–141 channels) leads to massive computational overhead during direct optimization. Furthermore, the signal-to-noise ratios (SNR) vary significantly across channels, making stable training difficult with traditional L1+SSIM loss functions.
Key Insight: Model 3DGS inside a learned latent space to compress high-dimensional spectral information via an autoencoder, coupled with depth-aware density control and pixel-level spectral pruning strategies.
Method¶
Overall Architecture¶
- Preprocessing: A convolutional autoencoder (AE) compresses high-dimensional spectral images into a low-dimensional latent space.
- SfM Initialization: SfM point clouds are estimated from grayscale channel slices, and 3D points are reprojected into the latent hyperspectral space to obtain initial spectral signatures.
- Latent Space 3DGS: Performs Gaussian rendering in the latent space and employs an MLP to predict view-dependent effects.
- Decoding & Training: Decodes latent predictions to full spectral images using a frozen decoder to compute spectral loss.
Key Designs¶
1. Hyperspectral Compression Autoencoder - Function: Employs a symmetric AE built with 1D convolutions and Squeeze-and-Excitation (SE) blocks to compress high-dimensional spectral data into low-dimensional latent representations. - Mechanism: The encoder compresses the spectral dimension using max-pooling, and the decoder reconstructs it via upsampling; no skip connections are used to ensure the decoder can operate independently. - Loss Function: Huber Loss \(L_{ae} = L_{Huber}(C^*(p), Dec(Enc(C^*(p))))\) is robust to outliers and handles varying SNRs across different cameras. - Design Motivation: The latent space reduces the computational overhead of 3DGS optimization and encodes the camera's spectral sensitivity, establishing a bounded upper limit for errors.
2. Depth-Aware Adaptive Density Control - Function: Modifies the split/clone criteria of 3DGS using a depth scaling function \(h(d,i) = (|\mathbf{E}_d \mathbf{X}_i| / (\beta_{field} \times R))^2\) to modulate gradient influence. - Mechanism: Scales NDC gradients by the square of the depth to reduce false high-gradient signals of close-to-camera Gaussians, stabilizing density control under the wide dynamic range of hyperspectral data. - Design Motivation: Due to the massive number of channels and wide range of values in hyperspectral data, conventional threshold-based 3DGS density control fails, and near-field Gaussians tend to split inconsistently across multiple views.
3. Pixel-Level Spectral Gaussian Pruning - Function: Calculates a pixel-level spectral importance score \(\mathcal{I}[g_i, p, d] = (1 - |C^*_d(p) - Dec(f_i)|) \alpha_i T_i\) for each Gaussian, retaining only those within the Top-K for any pixel. - Mechanism: Avoids pruning based on average scores (which leads to over-pruning), and instead decides to retain a Gaussian based on whether it falls within the Top-K importance list of at least one pixel. - Design Motivation: Direct cross-view pruning can lead to over-pruning and the loss of fine spectral details; the pixel-level approach guarantees sufficient spectral expressiveness for every pixel.
Loss & Training¶
- \(L_{CB}\): Charbonnier Loss (smoother than L1, avoiding extreme errors in sensitive bands)
- \(L_{CS}\): Cosine similarity loss (measures the angular distance of spectral vectors, suitable for spectral comparison)
- \(L_{SSIM}\): Maintains spatial and geometric consistency
- \(\beta\) balances the Charbonnier and Cosine terms in the spectral loss, while \(\lambda\) controls the SSIM weight
Key Experimental Results¶
Main Results¶
BaySpec Dataset (141 channels, high noise, ~360 images/scene):
| Method | PSNR↑ | SSIM↑ | SAM↓ | RMSE↓ |
|---|---|---|---|---|
| MipNeRF360 | 26.53 | 0.7442 | 0.0280 | 0.0476 |
| HS-NeRF | 19.82 | 0.6714 | 0.0534 | 0.1071 |
| 3DGS | 22.91 | 0.6321 | 0.1335 | 0.0810 |
| HyperGS | 27.11 | 0.7804 | 0.0254 | 0.0440 |
SOP Dataset (128 channels, low noise, ~40 images/scene):
| Method | PSNR↑ | SSIM↑ | SAM↓ | RMSE↓ |
|---|---|---|---|---|
| MipNeRF360 | 12.28 | 0.6824 | 0.1369 | 0.2658 |
| 3DGS | 28.58 | 0.9627 | 0.0301 | 0.0478 |
| HyperGS | 30.51 | 0.9756 | 0.00415 | 0.0354 |
Ablation Study¶
| Ablation Step | PSNR↑ | SSIM↑ | SAM↓ | RMSE↓ | N.Prim↓ |
|---|---|---|---|---|---|
| Base 3DGS | 22.91 | 0.6320 | 0.1335 | 0.0810 | 440k |
| + Spec. SFM | 23.05 | 0.6331 | 0.1310 | 0.0799 | 421k |
| + Latent AE | 24.87 | 0.7101 | 0.0548 | 0.0602 | 500k |
| + Densification | 25.25 | 0.7356 | 0.0365 | 0.0548 | 1.3M |
| + Pruning | 25.17 | 0.7199 | 0.0374 | 0.0555 | 412k |
| + View MLP | 27.05 | 0.7792 | 0.0253 | 0.0443 | 309k |
| + Custom Loss | 27.11 | 0.7804 | 0.0254 | 0.0440 | 309k |
Key Findings¶
- Latent space modeling is the core contribution: Moving from Base 3DGS to +Latent AE yields a PSNR improvement of nearly 2dB, and a reduction in SAM from 0.1335 to 0.0548, showing the most significant gain in spectral accuracy.
- Model compression via pruning and density control: Although the Gaussians expand from 500k to 1.3M after densification, pruning successfully reduces them to 309k (or 412k before MLP tuning), leading to a smaller model and cleaner spectra.
- Discrepancy across cameras and scenes: NeRF-based methods perform reasonably well on the high-frame-rate, high-noise BaySpec dataset. However, on the low-frame-rate, low-noise SOP dataset, 3DGS methods vastly outperform NeRF. This is because the explicit representation of 3DGS is inherently better suited for sparse-view interpolation.
- All-around superiority: HyperGS achieves the best results across all scenes and metrics.
Highlights & Insights¶
- Systematically adapts 3DGS to the hyperspectral field for the first time, establishing a complete HNVS benchmark.
- The joint latent space and AE strategy is highly generalizable, and can be extended to 3DGS modeling of other high-dimensional signals (such as multispectral or infrared).
- A view-dependent MLP simultaneously predicts spectral effects and anisotropic opacity, elegantly handling view-dependent spectral variations.
- The loss function design — using Huber Loss to train the AE, and Charbonnier + Cosine similarity to train the 3DGS — is specifically optimized for spectral characteristics.
- The pixel-level Top-K pruning strategy is better suited for high-dimensional data compared to traditional average- or total-volume-based pruning.
Limitations & Future Work¶
- Requires a pre-trained AE, which introduces an extra step.
- Relies on COLMAP for grayscale SfM, which may fail in textureless scenes.
- The experiments are only validated on two small datasets provided by HS-NeRF, lacking evaluation on large-scale scenes.
- Synthetic datasets rely on substituting spectral signatures based on semantic labels, which deviates from real-world hyperspectral scenes.
- Online inference speed and real-time rendering capabilities are not discussed.
Related Work & Insights¶
- HS-NeRF: The first HNVS method, but it is not end-to-end and suffers from instability, providing the dataset and motivation for this work.
- Scaffold-GS / Mip-Splatting: Representative improvements of 3DGS (focusing on compression and anti-aliasing), but they do not consider high-dimensional extensions.
- VDGS: Uses a hybrid NeRF-MLP to predict color/opacity for multispectral reconstruction, though it still relies on a 3DGS backbone.
- Insight: The combined paradigm of latent space compression and explicit 3D representation can be generalized to multi-modal extensions in radiance fields.
Rating¶
⭐⭐⭐⭐ — Achieves hyperspectral novel view synthesis under the 3DGS framework for the first time. The method is logically designed with solid ablations, though the experimental scale is relatively small, and a discussion on real-time rendering is missing.