HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars¶

Conference: CVPR 2026 arXiv: 2507.02803 Code: https://gserifi.github.io/HyperGaussians Area: 3D Vision Keywords: Gaussian splatting, face avatars, high-dimensional Gaussians, facial animation, conditional distribution

TL;DR¶

This paper proposes HyperGaussians, which extends 3DGS to high-dimensional multivariate Gaussians. Expression-dependent attribute variations are modeled via conditional distributions, and an inverse covariance trick enables efficient conditioning. Integrated as a plug-and-play module into FlashAvatar and GaussianHeadAvatar, the method significantly improves high-frequency detail quality.

Background & Motivation¶

Background: 3D Gaussian splatting (3DGS) has become the standard approach for face avatar modeling. Current SOTA methods bind Gaussians to 3DMM meshes and use MLPs to predict expression-dependent offsets for handling dynamics.
Limitations of Prior Work: Existing methods still struggle with nonlinear deformations (eye closure, mouth opening), complex lighting effects (specular reflections on glasses), and fine-grained structures (tooth gaps, glasses frames)—high-frequency details that are central to the uncanny valley effect.
Key Challenge: The expressiveness of 3D Gaussian primitives is inherently limited. Each Gaussian has only a few attributes (position, rotation, scale, color), and even with MLP-predicted expression-dependent offsets, linear combinations of attributes remain limited in expressiveness.
Goal: How can the expressiveness of 3D Gaussian primitives be enhanced to capture high-frequency dynamic effects without redesigning the entire pipeline?
Key Insight: Inspired by HyperNeRF—modeling deformation fields in a high-dimensional space and slicing them can handle topological changes. This idea is applied to Gaussian primitives: each Gaussian is defined as a multivariate Gaussian distribution in a high-dimensional space, and conditioning (slicing) yields expression-dependent 3D Gaussians.
Core Idea: Standard 3D Gaussians are extended into \((m+n)\)-dimensional HyperGaussians. Conditioning on an \(n\)-dimensional latent embedding yields dynamically adaptive 3D attributes, and an inverse covariance trick maintains efficiency.

Method¶

Overall Architecture¶

HyperGaussians is a plug-and-play representation enhancement module. In the original pipeline (e.g., FlashAvatar), the MLP outputs expression-dependent offsets \(\Delta\mu, \Delta r, \Delta s\). With HyperGaussians, the MLP instead outputs a latent vector \(z_\psi\), from which MAP-estimated offsets are computed via the conditional distribution of the high-dimensional Gaussian. Only the Gaussian representation is replaced; all other components (loss functions, hyperparameters) remain unchanged.

Key Designs¶

HyperGaussian High-Dimensional Extension:
- Function: Extends standard 3D Gaussians to \((m+n)\)-dimensional multivariate Gaussians with an additional \(n\)-dimensional latent space, enhancing representational capacity.
- Mechanism: A joint distribution \(\gamma = (\gamma_a, \gamma_b)^\top \sim \mathcal{N}(\mu, \Sigma)\) is defined, where \(\gamma_a \in \mathbb{R}^m\) corresponds to Gaussian attributes (position, rotation, scale) and \(\gamma_b \in \mathbb{R}^n\) corresponds to latent embeddings. Conditioning \(p(\gamma_a|\gamma_b)\) on a given latent embedding yields an ordinary 3D Gaussian: \(\mu_{a|b} = \mu_a + \Sigma_{ab}\Sigma_{bb}^{-1}(\gamma_b - \mu_b)\). From a Bayesian perspective, this is equivalent to MAP estimation, where the prior \(p(\gamma_a)\) acts as implicit regularization.
- Design Motivation: Compared to NDGS, which only models the conditional distribution of position (rotation and scale are independent of the latent code), HyperGaussians also models \(p(\Delta r|z)\) and \(p(\Delta s|z)\), allowing Gaussians to rotate and scale according to the latent code, yielding strictly greater expressiveness.
Inverse Covariance Trick:
- Function: Reduces the computational complexity of high-dimensional conditioning from \(O(n^3 + mn^2)\) to \(O(m^3 + m^2n)\)—from cubic in \(n\) to linear in \(n\).
- Mechanism: Reparameterizing via the precision matrix \(\Lambda = \Sigma^{-1}\), the conditional mean becomes \(\mu_{a|b} = \mu_a - \Lambda_{aa}^{-1}\Lambda_{ab}(\gamma_b - \mu_b)\) and the conditional covariance becomes \(\Sigma_{a|b} = \Lambda_{aa}^{-1}\). Only the small matrix \(\Lambda_{aa} \in \mathbb{R}^{m \times m}\) (where \(m=3\) or \(4\)) needs to be stored and inverted, reducing the parameter count from \(O((m+n)^2)\) to \(O(m^2 + mn)\).
- Design Motivation: The naïve implementation is prohibitively slow and memory-intensive at large latent dimensions (\(n > 8\)). At \(n=8\), this yields a 150% speedup; at \(n=128\), a 15000% speedup with 48–90% memory reduction, making large latent dimensions practical.
Plug-and-Play Integration Strategy:
- Function: Improves performance by directly replacing 3DGS primitives without modifying architecture or hyperparameters.
- Mechanism: Using FlashAvatar as an example, the deformation MLP output is changed from direct offsets to a latent code \(z_\psi\), from which offsets are computed via the HyperGaussian conditional distribution. All other components (losses, training strategy, learning rates) remain unchanged. A latent dimension of \(n=8\) achieves the best performance. GaussianHeadAvatar is integrated in the same manner.
- Design Motivation: As an improvement orthogonal to architectural design, HyperGaussians enhances quality without touching the model architecture, meaning it can be freely combined with any future architectural improvements.

Loss & Training¶

All loss functions and hyperparameters are inherited from the base methods (FlashAvatar / GaussianHeadAvatar). The learning rate for HyperGaussian parameters is set to \(10^{-4}\). FlashAvatar trains with ~15k Gaussians for 30k iterations (15–20 minutes on RTX 4090); GaussianHeadAvatar trains for 600k iterations (~2 days). HyperGaussians add only ~1% training overhead (30 minutes). Key design decisions: three independent HyperGaussian distributions are maintained per Gaussian primitive for position, rotation, and scale respectively; Cholesky parameterization ensures positive definiteness of the covariance matrix. The latent dimension \(n=8\) is determined via ablation as the optimal trade-off. Parameterizing the precision matrix requires only the two blocks \(\Lambda_{aa}\) and \(\Lambda_{ab}\), significantly reducing memory usage. Code is publicly released to facilitate integration into other pipelines.

Key Experimental Results¶

Main Results (29 subjects, 6 datasets)¶

Method	PSNR↑	SSIM↑	LPIPS↓
SplattingAvatar	28.58	0.9396	0.0902
MonoGaussianAvatar	29.94	0.9456	0.0655
FlashAvatar	29.43	0.9466	0.0511
Ours (FA)	29.99	0.9510	0.0498
GaussianHeadAvatar	24.10	0.8819	0.2027
Ours (GHA)	24.38	0.8819	0.1977

Ablation Study¶

Configuration	LPIPS↓	FPS
HyperGaussians (\(n=8\))	0.0498	300
Deeper MLP (same param count)	0.0572 (+15%)	158 (−47%)
Wider MLP (same param count)	0.0512 (+3%)	178 (−41%)

Key Findings¶

HyperGaussians significantly outperforms increased MLP capacity under the same parameter budget, achieving better LPIPS without sacrificing speed (300 FPS vs. 158 FPS).
Improvements are most prominent in high-frequency details: specular reflections, glasses frames, tooth gaps, and skin wrinkles—precisely the areas where standard 3DGS is weakest—confirming that high-dimensional conditioning genuinely enhances local expressiveness.
Even \(n=1\) (the minimal extension) already outperforms the baseline; \(n=8\) is the optimal trade-off; \(n=128\) remains feasible with the inverse covariance trick but yields diminishing returns.
The inverse covariance trick is critical for practicality: without it, \(n=128\) is nearly infeasible (15000% speedup, >90% memory reduction with the trick).
The method is also effective in cross-reenactment scenarios, indicating that the learned capability is general high-frequency modeling rather than overfitting to specific expressions.
In multi-view settings (GaussianHeadAvatar), HyperGaussians likewise improve wrinkle and reflection quality with only 1% additional training overhead.
Inverse covariance efficiency: at \(n=8\), naïve implementation uses 42MB per conditioning step vs. 22MB with the trick (−48%); at \(n=128\), memory is reduced by >90%.
Cross-dataset consistency: consistent improvements across 6 different datasets including INSTA, NerFace, IMAvatar, and NeRSemble.
Training convergence is also accelerated—HyperGaussians reach the baseline's final quality in fewer iterations.

Highlights & Insights¶

Elegant Bayesian Unification: NDGS, 4DGS, 6DGS, 7DGS, and other multi-dimensional Gaussian extensions are unified as special cases of conditional Gaussian distributions, with their expressiveness limitations (inability to condition rotation/scale) clearly identified. This theoretical unification is itself a significant contribution.
Improvement Orthogonal to Architecture: Quality is enhanced without modifying any model design, meaning HyperGaussians can be freely combined with any future architectural improvements—this "free upgrade" property is highly attractive.
Generality of the Inverse Covariance Trick: The trick is not limited to face avatars; any scenario requiring high-dimensional Gaussian conditioning can benefit. The reduction from \(O(n^3)\) to \(O(n)\) is highly significant for real-time applications.

Limitations & Future Work¶

Validation is currently limited to face avatars; effectiveness on full-body dynamic scenes (clothing, hair motion, and other large-scale nonlinear deformations) remains unknown.
The conditional covariance matrix \(\Sigma_{a|b} = \Lambda_{aa}^{-1}\), which encodes uncertainty in 3D Gaussian parameters, is not utilized in the current implementation—it could be leveraged for active learning or rendering quality assessment.
Storing and inverting the precision matrix blocks \(\Lambda_{aa}\) and \(\Lambda_{ab}\) is required, and while the parameter count is reduced from \(O((m+n)^2)\) to \(O(m^2+mn)\), the implementation is more complex than standard 3DGS.
For highly extreme out-of-distribution expressions, the extrapolation capability of the conditional distribution may be limited, as the linear conditioning assumption of multivariate Gaussians presupposes linear relationships among parameters.
Quantitative improvements, while consistent, are modest in magnitude (PSNR +0.56, LPIPS −0.0013); visual improvements are concentrated in high-frequency detail regions.
The choice of latent dimension \(n=8\) is determined by ablation, but different scenarios may require different optimal values. An adaptive dimension selection mechanism remains to be developed.

vs. NDGS/6DGS/7DGS: These methods are special cases of HyperGaussians—they cannot condition rotation/scale and are computationally infeasible at large latent dimensions. HyperGaussians addresses both limitations and eliminates the Gaussian vanishing problem under large deformations.
vs. FlashAvatar: Replacing only the Gaussian representation reduces LPIPS from 0.0511 to 0.0498 and improves PSNR from 29.43 to 29.99, demonstrating the value of improving the underlying representation. FlashAvatar's MLP-predicted offsets constitute "external modulation," whereas HyperGaussians internalize modulation within the representation itself.
vs. HyperNeRF: The concept of modeling deformations in a high-dimensional space is inherited from HyperNeRF, but HyperGaussians adapts it to the Gaussian splatting framework and achieves real-time rendering (300 FPS) via the inverse covariance trick. HyperNeRF's "slicing" and HyperGaussians' "conditioning" are formally analogous but differ entirely in implementation and theoretical framework.
vs. MonoGaussianAvatar: MonoGaussianAvatar requires 100k+ Gaussians and 12 hours of training, whereas FlashAvatar + HyperGaussians uses only ~15k Gaussians and 20 minutes, with superior performance—demonstrating that better representations matter more than more parameters.
vs. SplattingAvatar: SplattingAvatar does not account for the effect of expression changes on appearance (e.g., specular reflection shifts), leading to blurry rendering. HyperGaussians naturally models this dependency through conditional distributions.
Future Outlook: The HyperGaussians framework can be applied to dynamic scenes beyond faces (full body, animals, deformable objects); the inverse covariance trick makes higher latent dimensions feasible.
vs. 4DGS: 4DGS adds a temporal dimension but requires custom CUDA kernels; HyperGaussians adds arbitrary dimensions and can use standard 3DGS rasterizers via the inverse covariance trick.
Gaussian Mixture Model Perspective: NDGS is essentially a multi-dimensional extension of Gaussian mixture models, but density functions that depend on the joint distribution become unstable under large deformations; HyperGaussians avoids this by using conditional distributions. Since the conditional mean computation is equivalent to MAP estimation, the prior \(p(\mathcal{A})\) acts as implicit regularization, helping preserve fine details at test time.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Triple innovation of high-dimensional Gaussians + Bayesian perspective + inverse covariance trick, with outstanding theoretical depth and unification of multiple existing methods
Experimental Thoroughness: ⭐⭐⭐⭐⭐ 29 subjects across 6 datasets, including ablations, speed comparisons, and both monocular and multi-view settings
Writing Quality: ⭐⭐⭐⭐⭐ Mathematical derivations are clear and elegant; the Bayesian unification analysis is particularly well executed
Value: ⭐⭐⭐⭐⭐ Plug-and-play representation upgrade with strong generality; the inverse covariance trick has broad application prospects