CUBE: Representing 3D Faces with Learnable B-Spline Volumes¶

Conference: CVPR 2026 arXiv: 2604.12894 Code: None Area: 3D Vision / Face Reconstruction Keywords: B-spline volumes, face representation, scan registration, local control, geometry editing

TL;DR¶

This paper proposes CUBE (Control-based Unified B-spline Encoding), a hybrid geometric representation combining B-spline volumes with learnable high-dimensional control features. Through a two-stage decoding pipeline (B-spline basis interpolation followed by a lightweight MLP residual), CUBE enables editable, high-fidelity 3D face reconstruction and scan registration.

Background & Motivation¶

Background: 3D face representation is dominated by three paradigms: 3D Morphable Models (3DMMs) provide compact, disentangled linear spaces but lack fine detail; nonlinear neural models improve flexibility but sacrifice interpretability and local control; implicit representations offer high detail but lack semantic correspondence and require costly isosurface extraction.

Limitations of Prior Work: 3DMMs are constrained by fixed topology and low-dimensional parameter spaces, making them unable to capture subject-specific high-frequency details. Neural models lack local editing capability. Implicit models are incompatible with standard graphics pipelines.

Key Challenge: Local controllability, geometric expressiveness, and computational efficiency are inherently difficult to achieve simultaneously in a single representation.

Goal: To design a hybrid face representation that combines the local control properties of B-splines with the expressive power of neural networks.

Key Insight: Replace the conventional 3D control points of B-spline volumes with high-dimensional learnable control features, and supplement high-frequency detail via a lightweight MLP.

Core Idea: A high-dimensional control feature lattice (e.g., \(8\times8\times8\)) defines a continuous mapping from the parametric domain to Euclidean space; the B-spline basis provides local support, enabling local editing.

Method¶

Overall Architecture¶

CUBE is parameterized by a high-dimensional control feature lattice. Given 3D coordinates on a fixed template mesh, B-spline bases locally blend the control features to produce high-dimensional vectors, whose first three dimensions define a base mesh position; the full feature vector is then passed to a lightweight MLP to predict a residual displacement. The output is a 3D surface with dense semantic correspondence.

Key Designs¶

High-Dimensional Control Feature Lattice:
- Function: Parameterize 3D face shape with a compact set of lattice points in place of a dense mesh.
- Mechanism: Whereas conventional B-spline volumes use 3D control points, CUBE replaces them with high-dimensional (e.g., 32-dimensional) control features. B-spline bases locally blend the neighboring control features at each query point to produce a high-dimensional feature vector, preserving the local support property of B-splines — modifying a single control feature affects only its local region.
- Design Motivation: Standard B-spline 3D control points lack the expressive capacity to represent complex face shapes with a small number of lattice points.
Two-Stage Decoding:
- Function: Capture both global shape and local detail.
- Mechanism: The first three dimensions of the blended high-dimensional feature vector directly define a coarse base mesh (global shape), while the full feature vector is fed to a lightweight MLP to predict residual displacements from the base shape (high-frequency detail).
- Design Motivation: B-spline bases are inherently smooth and ill-suited to representing high-frequency geometry. The MLP compensates for this limitation while preserving local support, since its inputs are derived from locally blended features.
Transformer-Based Encoder:
- Function: Predict CUBE control features from unstructured point clouds or monocular images.
- Mechanism: A Transformer encoder is trained to map unstructured 3D head scans (or monocular images) to CUBE control feature lattices, enabling feed-forward scan registration and image-based reconstruction.
- Design Motivation: The CUBE parameter space is compact (e.g., \(8^3 \times 32 = 16\text{K}\) parameters), making it amenable to direct regression.

Loss & Training¶

Vertex-to-vertex \(\ell_2\) loss + normal consistency loss + Laplacian smoothing regularization. The encoder and CUBE decoder are trained end-to-end.

Key Experimental Results¶

Main Results¶

Method	Type	Scan Registration Error↓	Correspondence Accuracy↑
BPS	Basis Point Set	2.85	82.3%
Shape-my-face	PointNet	2.42	85.1%
ImFace	Implicit	2.15	87.5%
CUBE	B-Spline	1.89	91.2%

Ablation Study¶

Configuration	Scan Error↓	Notes
Full CUBE	1.89	High-dim features + MLP residual
w/o MLP residual	2.35	B-spline basis only
3D control points (conventional)	2.78	No high-dim features
Lattice \(16^3\)	1.85	More control points
Lattice \(4^3\)	2.45	Fewer control points

Key Findings¶

The MLP residual contributes significantly (removal increases error by 24%), underscoring the importance of high-frequency detail modeling.
High-dimensional control features vs. 3D control points: error decreases from 2.78 to 2.35 (−15%), demonstrating enhanced expressiveness.
An \(8^3\) lattice is already sufficient: scaling to \(16^3\) yields only marginal improvement.

Highlights & Insights¶

Adapting NURBS — a classical CAD representation — to face modeling and augmenting it with learnable features constitutes an elegant hybrid design.
Preservation of the local support property enables interactive editing: local face regions can be manipulated by swapping or modifying individual control features.
The two-stage decoding strategy (coarse B-spline + fine MLP) is generalizable to other geometric representations.

Limitations & Future Work¶

The model is face-specific; hair and accessories are not modeled.
Fine detail under extreme expressions may be inferior to implicit representations.
The choice of lattice resolution requires balancing expressiveness and efficiency.
The approach is extendable to other body parts such as the full body or hands.

vs. 3DMM (FLAME): 3DMMs rely on linear PCA bases, whereas CUBE employs B-spline volumes combined with an MLP, achieving greater expressiveness while retaining local controllability.
vs. ImFace: ImFace is an implicit SDF representation that requires Marching Cubes for mesh extraction; CUBE directly outputs a mesh via template-based query.

Rating¶

Novelty: ⭐⭐⭐⭐ The hybrid combination of B-spline volumes and high-dimensional features is creative.
Experimental Thoroughness: ⭐⭐⭐⭐ Validated on two applications: scan registration and image-based reconstruction.
Writing Quality: ⭐⭐⭐⭐ The representation design is described with clarity.
Value: ⭐⭐⭐⭐ Practically valuable for editable face modeling.