HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars¶
Conference: CVPR 2026 arXiv: 2507.02803 Code: https://gserifi.github.io/HyperGaussians Area: 3D Vision Keywords: Gaussian splatting, face avatars, high-dimensional Gaussians, facial animation, conditional distribution
TL;DR¶
This paper proposes HyperGaussians, which extends 3DGS to high-dimensional multivariate Gaussians. Expression-dependent attribute variations are modeled via conditional distributions, and an inverse covariance trick enables efficient conditioning. Integrated as a plug-and-play module into FlashAvatar and GaussianHeadAvatar, the method significantly improves high-frequency detail quality.
Background & Motivation¶
- Background: 3D Gaussian splatting (3DGS) has become the standard approach for face avatar modeling. Current SOTA methods bind Gaussians to 3DMM meshes and use MLPs to predict expression-dependent offsets for handling dynamics.
- Limitations of Prior Work: Existing methods still struggle with nonlinear deformations (eye closure, mouth opening), complex lighting effects (specular reflections on glasses), and fine-grained structures (tooth gaps, glasses frames)—high-frequency details that are central to the uncanny valley effect.
- Key Challenge: The expressiveness of 3D Gaussian primitives is inherently limited. Each Gaussian has only a few attributes (position, rotation, scale, color), and even with MLP-predicted expression-dependent offsets, linear combinations of attributes remain limited in expressiveness.
- Goal: How can the expressiveness of 3D Gaussian primitives be enhanced to capture high-frequency dynamic effects without redesigning the entire pipeline?
- Key Insight: Inspired by HyperNeRF—modeling deformation fields in a high-dimensional space and slicing them can handle topological changes. This idea is applied to Gaussian primitives: each Gaussian is defined as a multivariate Gaussian distribution in a high-dimensional space, and conditioning (slicing) yields expression-dependent 3D Gaussians.
- Core Idea: Standard 3D Gaussians are extended into \((m+n)\)-dimensional HyperGaussians. Conditioning on an \(n\)-dimensional latent embedding yields dynamically adaptive 3D attributes, and an inverse covariance trick maintains efficiency.
Method¶
Overall Architecture¶
HyperGaussians is a plug-and-play representation enhancement module. In the original pipeline (e.g., FlashAvatar), the MLP outputs expression-dependent offsets \(\Delta\mu, \Delta r, \Delta s\). With HyperGaussians, the MLP instead outputs a latent vector \(z_\psi\), from which MAP-estimated offsets are computed via the conditional distribution of the high-dimensional Gaussian. Only the Gaussian representation is replaced; all other components (loss functions, hyperparameters) remain unchanged.
Key Designs¶
-
HyperGaussian High-Dimensional Extension:
- Function: Extends standard 3D Gaussians to \((m+n)\)-dimensional multivariate Gaussians with an additional \(n\)-dimensional latent space, enhancing representational capacity.
- Mechanism: A joint distribution \(\gamma = (\gamma_a, \gamma_b)^\top \sim \mathcal{N}(\mu, \Sigma)\) is defined, where \(\gamma_a \in \mathbb{R}^m\) corresponds to Gaussian attributes (position, rotation, scale) and \(\gamma_b \in \mathbb{R}^n\) corresponds to latent embeddings. Conditioning \(p(\gamma_a|\gamma_b)\) on a given latent embedding yields an ordinary 3D Gaussian: \(\mu_{a|b} = \mu_a + \Sigma_{ab}\Sigma_{bb}^{-1}(\gamma_b - \mu_b)\). From a Bayesian perspective, this is equivalent to MAP estimation, where the prior \(p(\gamma_a)\) acts as implicit regularization.
- Design Motivation: Compared to NDGS, which only models the conditional distribution of position (rotation and scale are independent of the latent code), HyperGaussians also models \(p(\Delta r|z)\) and \(p(\Delta s|z)\), allowing Gaussians to rotate and scale according to the latent code, yielding strictly greater expressiveness.
-
Inverse Covariance Trick:
- Function: Reduces the computational complexity of high-dimensional conditioning from \(O(n^3 + mn^2)\) to \(O(m^3 + m^2n)\)—from cubic in \(n\) to linear in \(n\).
- Mechanism: Reparameterizing via the precision matrix \(\Lambda = \Sigma^{-1}\), the conditional mean becomes \(\mu_{a|b} = \mu_a - \Lambda_{aa}^{-1}\Lambda_{ab}(\gamma_b - \mu_b)\) and the conditional covariance becomes \(\Sigma_{a|b} = \Lambda_{aa}^{-1}\). Only the small matrix \(\Lambda_{aa} \in \mathbb{R}^{m \times m}\) (where \(m=3\) or \(4\)) needs to be stored and inverted, reducing the parameter count from \(O((m+n)^2)\) to \(O(m^2 + mn)\).
- Design Motivation: The naïve implementation is prohibitively slow and memory-intensive at large latent dimensions (\(n > 8\)). At \(n=8\), this yields a 150% speedup; at \(n=128\), a 15000% speedup with 48–90% memory reduction, making large latent dimensions practical.
-
Plug-and-Play Integration Strategy:
- Function: Improves performance by directly replacing 3DGS primitives without modifying architecture or hyperparameters.
- Mechanism: Using FlashAvatar as an example, the deformation MLP output is changed from direct offsets to a latent code \(z_\psi\), from which offsets are computed via the HyperGaussian conditional distribution. All other components (losses, training strategy, learning rates) remain unchanged. A latent dimension of \(n=8\) achieves the best performance. GaussianHeadAvatar is integrated in the same manner.
- Design Motivation: As an improvement orthogonal to architectural design, HyperGaussians enhances quality without touching the model architecture, meaning it can be freely combined with any future architectural improvements.
Loss & Training¶
All loss functions and hyperparameters are inherited from the base methods (FlashAvatar / GaussianHeadAvatar). The learning rate for HyperGaussian parameters is set to \(10^{-4}\). FlashAvatar trains with ~15k Gaussians for 30k iterations (15–20 minutes on RTX 4090); GaussianHeadAvatar trains for 600k iterations (~2 days). HyperGaussians add only ~1% training overhead (30 minutes). Key design decisions: three independent HyperGaussian distributions are maintained per Gaussian primitive for position, rotation, and scale respectively; Cholesky parameterization ensures positive definiteness of the covariance matrix. The latent dimension \(n=8\) is determined via ablation as the optimal trade-off. Parameterizing the precision matrix requires only the two blocks \(\Lambda_{aa}\) and \(\Lambda_{ab}\), significantly reducing memory usage. Code is publicly released to facilitate integration into other pipelines.
Key Experimental Results¶
Main Results (29 subjects, 6 datasets)¶
| Method | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| SplattingAvatar | 28.58 | 0.9396 | 0.0902 |
| MonoGaussianAvatar | 29.94 | 0.9456 | 0.0655 |
| FlashAvatar | 29.43 | 0.9466 | 0.0511 |
| Ours (FA) | 29.99 | 0.9510 | 0.0498 |
| GaussianHeadAvatar | 24.10 | 0.8819 | 0.2027 |
| Ours (GHA) | 24.38 | 0.8819 | 0.1977 |
Ablation Study¶
| Configuration | LPIPS↓ | FPS |
|---|---|---|
| HyperGaussians (\(n=8\)) | 0.0498 | 300 |
| Deeper MLP (same param count) | 0.0572 (+15%) | 158 (−47%) |
| Wider MLP (same param count) | 0.0512 (+3%) | 178 (−41%) |
Key Findings¶
- HyperGaussians significantly outperforms increased MLP capacity under the same parameter budget, achieving better LPIPS without sacrificing speed (300 FPS vs. 158 FPS).
- Improvements are most prominent in high-frequency details: specular reflections, glasses frames, tooth gaps, and skin wrinkles—precisely the areas where standard 3DGS is weakest—confirming that high-dimensional conditioning genuinely enhances local expressiveness.
- Even \(n=1\) (the minimal extension) already outperforms the baseline; \(n=8\) is the optimal trade-off; \(n=128\) remains feasible with the inverse covariance trick but yields diminishing returns.
- The inverse covariance trick is critical for practicality: without it, \(n=128\) is nearly infeasible (15000% speedup, >90% memory reduction with the trick).
- The method is also effective in cross-reenactment scenarios, indicating that the learned capability is general high-frequency modeling rather than overfitting to specific expressions.
- In multi-view settings (GaussianHeadAvatar), HyperGaussians likewise improve wrinkle and reflection quality with only 1% additional training overhead.
- Inverse covariance efficiency: at \(n=8\), naïve implementation uses 42MB per conditioning step vs. 22MB with the trick (−48%); at \(n=128\), memory is reduced by >90%.
- Cross-dataset consistency: consistent improvements across 6 different datasets including INSTA, NerFace, IMAvatar, and NeRSemble.
- Training convergence is also accelerated—HyperGaussians reach the baseline's final quality in fewer iterations.
Highlights & Insights¶
- Elegant Bayesian Unification: NDGS, 4DGS, 6DGS, 7DGS, and other multi-dimensional Gaussian extensions are unified as special cases of conditional Gaussian distributions, with their expressiveness limitations (inability to condition rotation/scale) clearly identified. This theoretical unification is itself a significant contribution.
- Improvement Orthogonal to Architecture: Quality is enhanced without modifying any model design, meaning HyperGaussians can be freely combined with any future architectural improvements—this "free upgrade" property is highly attractive.
- Generality of the Inverse Covariance Trick: The trick is not limited to face avatars; any scenario requiring high-dimensional Gaussian conditioning can benefit. The reduction from \(O(n^3)\) to \(O(n)\) is highly significant for real-time applications.
Limitations & Future Work¶
- Validation is currently limited to face avatars; effectiveness on full-body dynamic scenes (clothing, hair motion, and other large-scale nonlinear deformations) remains unknown.
- The conditional covariance matrix \(\Sigma_{a|b} = \Lambda_{aa}^{-1}\), which encodes uncertainty in 3D Gaussian parameters, is not utilized in the current implementation—it could be leveraged for active learning or rendering quality assessment.
- Storing and inverting the precision matrix blocks \(\Lambda_{aa}\) and \(\Lambda_{ab}\) is required, and while the parameter count is reduced from \(O((m+n)^2)\) to \(O(m^2+mn)\), the implementation is more complex than standard 3DGS.
- For highly extreme out-of-distribution expressions, the extrapolation capability of the conditional distribution may be limited, as the linear conditioning assumption of multivariate Gaussians presupposes linear relationships among parameters.
- Quantitative improvements, while consistent, are modest in magnitude (PSNR +0.56, LPIPS −0.0013); visual improvements are concentrated in high-frequency detail regions.
- The choice of latent dimension \(n=8\) is determined by ablation, but different scenarios may require different optimal values. An adaptive dimension selection mechanism remains to be developed.
Related Work & Insights¶
- vs. NDGS/6DGS/7DGS: These methods are special cases of HyperGaussians—they cannot condition rotation/scale and are computationally infeasible at large latent dimensions. HyperGaussians addresses both limitations and eliminates the Gaussian vanishing problem under large deformations.
- vs. FlashAvatar: Replacing only the Gaussian representation reduces LPIPS from 0.0511 to 0.0498 and improves PSNR from 29.43 to 29.99, demonstrating the value of improving the underlying representation. FlashAvatar's MLP-predicted offsets constitute "external modulation," whereas HyperGaussians internalize modulation within the representation itself.
- vs. HyperNeRF: The concept of modeling deformations in a high-dimensional space is inherited from HyperNeRF, but HyperGaussians adapts it to the Gaussian splatting framework and achieves real-time rendering (300 FPS) via the inverse covariance trick. HyperNeRF's "slicing" and HyperGaussians' "conditioning" are formally analogous but differ entirely in implementation and theoretical framework.
- vs. MonoGaussianAvatar: MonoGaussianAvatar requires 100k+ Gaussians and 12 hours of training, whereas FlashAvatar + HyperGaussians uses only ~15k Gaussians and 20 minutes, with superior performance—demonstrating that better representations matter more than more parameters.
- vs. SplattingAvatar: SplattingAvatar does not account for the effect of expression changes on appearance (e.g., specular reflection shifts), leading to blurry rendering. HyperGaussians naturally models this dependency through conditional distributions.
- Future Outlook: The HyperGaussians framework can be applied to dynamic scenes beyond faces (full body, animals, deformable objects); the inverse covariance trick makes higher latent dimensions feasible.
- vs. 4DGS: 4DGS adds a temporal dimension but requires custom CUDA kernels; HyperGaussians adds arbitrary dimensions and can use standard 3DGS rasterizers via the inverse covariance trick.
- Gaussian Mixture Model Perspective: NDGS is essentially a multi-dimensional extension of Gaussian mixture models, but density functions that depend on the joint distribution become unstable under large deformations; HyperGaussians avoids this by using conditional distributions. Since the conditional mean computation is equivalent to MAP estimation, the prior \(p(\mathcal{A})\) acts as implicit regularization, helping preserve fine details at test time.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Triple innovation of high-dimensional Gaussians + Bayesian perspective + inverse covariance trick, with outstanding theoretical depth and unification of multiple existing methods
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ 29 subjects across 6 datasets, including ablations, speed comparisons, and both monocular and multi-view settings
- Writing Quality: ⭐⭐⭐⭐⭐ Mathematical derivations are clear and elegant; the Bayesian unification analysis is particularly well executed
- Value: ⭐⭐⭐⭐⭐ Plug-and-play representation upgrade with strong generality; the inverse covariance trick has broad application prospects