CUBE: Representing 3D Faces with Learnable B-Spline Volumes¶
Conference: CVPR 2026 arXiv: 2604.12894 Code: None Area: 3D Vision / Face Reconstruction Keywords: B-spline volumes, face representation, scan registration, local control, geometry editing
TL;DR¶
This paper proposes CUBE (Control-based Unified B-spline Encoding), a hybrid geometric representation combining B-spline volumes with learnable high-dimensional control features. Through a two-stage decoding pipeline (B-spline basis interpolation followed by a lightweight MLP residual), CUBE enables editable, high-fidelity 3D face reconstruction and scan registration.
Background & Motivation¶
Background: 3D face representation is dominated by three paradigms: 3D Morphable Models (3DMMs) provide compact, disentangled linear spaces but lack fine detail; nonlinear neural models improve flexibility but sacrifice interpretability and local control; implicit representations offer high detail but lack semantic correspondence and require costly isosurface extraction.
Limitations of Prior Work: 3DMMs are constrained by fixed topology and low-dimensional parameter spaces, making them unable to capture subject-specific high-frequency details. Neural models lack local editing capability. Implicit models are incompatible with standard graphics pipelines.
Key Challenge: Local controllability, geometric expressiveness, and computational efficiency are inherently difficult to achieve simultaneously in a single representation.
Goal: To design a hybrid face representation that combines the local control properties of B-splines with the expressive power of neural networks.
Key Insight: Replace the conventional 3D control points of B-spline volumes with high-dimensional learnable control features, and supplement high-frequency detail via a lightweight MLP.
Core Idea: A high-dimensional control feature lattice (e.g., \(8\times8\times8\)) defines a continuous mapping from the parametric domain to Euclidean space; the B-spline basis provides local support, enabling local editing.
Method¶
Overall Architecture¶
CUBE is parameterized by a high-dimensional control feature lattice. Given 3D coordinates on a fixed template mesh, B-spline bases locally blend the control features to produce high-dimensional vectors, whose first three dimensions define a base mesh position; the full feature vector is then passed to a lightweight MLP to predict a residual displacement. The output is a 3D surface with dense semantic correspondence.
Key Designs¶
-
High-Dimensional Control Feature Lattice:
- Function: Parameterize 3D face shape with a compact set of lattice points in place of a dense mesh.
- Mechanism: Whereas conventional B-spline volumes use 3D control points, CUBE replaces them with high-dimensional (e.g., 32-dimensional) control features. B-spline bases locally blend the neighboring control features at each query point to produce a high-dimensional feature vector, preserving the local support property of B-splines — modifying a single control feature affects only its local region.
- Design Motivation: Standard B-spline 3D control points lack the expressive capacity to represent complex face shapes with a small number of lattice points.
-
Two-Stage Decoding:
- Function: Capture both global shape and local detail.
- Mechanism: The first three dimensions of the blended high-dimensional feature vector directly define a coarse base mesh (global shape), while the full feature vector is fed to a lightweight MLP to predict residual displacements from the base shape (high-frequency detail).
- Design Motivation: B-spline bases are inherently smooth and ill-suited to representing high-frequency geometry. The MLP compensates for this limitation while preserving local support, since its inputs are derived from locally blended features.
-
Transformer-Based Encoder:
- Function: Predict CUBE control features from unstructured point clouds or monocular images.
- Mechanism: A Transformer encoder is trained to map unstructured 3D head scans (or monocular images) to CUBE control feature lattices, enabling feed-forward scan registration and image-based reconstruction.
- Design Motivation: The CUBE parameter space is compact (e.g., \(8^3 \times 32 = 16\text{K}\) parameters), making it amenable to direct regression.
Loss & Training¶
Vertex-to-vertex \(\ell_2\) loss + normal consistency loss + Laplacian smoothing regularization. The encoder and CUBE decoder are trained end-to-end.
Key Experimental Results¶
Main Results¶
| Method | Type | Scan Registration Error↓ | Correspondence Accuracy↑ |
|---|---|---|---|
| BPS | Basis Point Set | 2.85 | 82.3% |
| Shape-my-face | PointNet | 2.42 | 85.1% |
| ImFace | Implicit | 2.15 | 87.5% |
| CUBE | B-Spline | 1.89 | 91.2% |
Ablation Study¶
| Configuration | Scan Error↓ | Notes |
|---|---|---|
| Full CUBE | 1.89 | High-dim features + MLP residual |
| w/o MLP residual | 2.35 | B-spline basis only |
| 3D control points (conventional) | 2.78 | No high-dim features |
| Lattice \(16^3\) | 1.85 | More control points |
| Lattice \(4^3\) | 2.45 | Fewer control points |
Key Findings¶
- The MLP residual contributes significantly (removal increases error by 24%), underscoring the importance of high-frequency detail modeling.
- High-dimensional control features vs. 3D control points: error decreases from 2.78 to 2.35 (−15%), demonstrating enhanced expressiveness.
- An \(8^3\) lattice is already sufficient: scaling to \(16^3\) yields only marginal improvement.
Highlights & Insights¶
- Adapting NURBS — a classical CAD representation — to face modeling and augmenting it with learnable features constitutes an elegant hybrid design.
- Preservation of the local support property enables interactive editing: local face regions can be manipulated by swapping or modifying individual control features.
- The two-stage decoding strategy (coarse B-spline + fine MLP) is generalizable to other geometric representations.
Limitations & Future Work¶
- The model is face-specific; hair and accessories are not modeled.
- Fine detail under extreme expressions may be inferior to implicit representations.
- The choice of lattice resolution requires balancing expressiveness and efficiency.
- The approach is extendable to other body parts such as the full body or hands.
Related Work & Insights¶
- vs. 3DMM (FLAME): 3DMMs rely on linear PCA bases, whereas CUBE employs B-spline volumes combined with an MLP, achieving greater expressiveness while retaining local controllability.
- vs. ImFace: ImFace is an implicit SDF representation that requires Marching Cubes for mesh extraction; CUBE directly outputs a mesh via template-based query.
Rating¶
- Novelty: ⭐⭐⭐⭐ The hybrid combination of B-spline volumes and high-dimensional features is creative.
- Experimental Thoroughness: ⭐⭐⭐⭐ Validated on two applications: scan registration and image-based reconstruction.
- Writing Quality: ⭐⭐⭐⭐ The representation design is described with clarity.
- Value: ⭐⭐⭐⭐ Practically valuable for editable face modeling.