ReWeaver: Towards Simulation-Ready and Topology-Accurate Garment Reconstruction¶
Conference: CVPR 2026 arXiv: 2601.16672 Authors: Ming Li, Hui Shan, Kai Zheng, Chentao Shen, Siyu Liu, Yanwei Fu, Zhen Chen, Xiangru Huang Institutions: Zhejiang University, Shanghai Innovation Institute, Westlake University, Fudan University, Adobe, Xidian University Code: To be confirmed Area: 3D Vision Keywords: Garment Reconstruction, Sewing Pattern, Topology Reconstruction, Multi-View Reconstruction, Physical Simulation
TL;DR¶
This paper proposes ReWeaver, a framework that jointly reconstructs 3D garment geometry and 2D sewing patterns from as few as four multi-view RGB images. A dual-path Transformer predicts 3D patches/curves and their topological connectivity, after which an intra-group attention module unfolds the 3D structure into 2D panel edges. ReWeaver is the first method to produce topology-accurate garment assets that are directly usable in physical simulation.
Background & Motivation¶
High-quality 3D garment reconstruction is critical for applications such as virtual try-on, digital humans, gaming, and robotic manipulation. Existing approaches suffer from two major limitations:
-
Limitations of unstructured representations: Methods based on point clouds, SDFs, 3D Gaussian splatting, and similar representations can approximate garment geometry but lack explicit seam/panel structure, making them incompatible with physical simulation, garment editing, or retargeting. Such representations are inherently misaligned with industry-standard garment design pipelines, which are centered on 2D sewing patterns.
-
Limitations of existing sewing pattern methods:
- Methods relying on predefined topologies (e.g., DiffAvatar) are restricted to simple garments and cannot handle unseen layouts.
- Vision-language-model-based methods (e.g., ChatGarment, AIpparel) generate 2D patterns via tokenized JSON descriptions, offering stronger topology generalization but insufficient geometric accuracy.
- Most methods focus exclusively on 2D patterns, neglecting accurate 3D geometric understanding.
Core Goal: Simultaneously reconstruct accurate garment topology (which panels/seams are connected) and geometry (precise 3D shape of each element), yielding outputs suitable for both 3D perception and high-fidelity physical simulation.
Method¶
Overall Architecture¶
ReWeaver adopts an encoder–decoder architecture consisting of four stages:
- Multi-view visual encoding (Section 3.2): A VGGT-based multi-view encoder extracts unified features.
- 3D curve and patch prediction (Section 3.3): A dual-path Transformer predicts 3D geometry and topology.
- 2D pattern prediction (Section 3.4): An intra-group attention module unfolds the 3D structure into 2D panel edges.
- Topology/geometry refinement: Post-processing enforces panel closure and topological consistency.
Terminology (dual 2D/3D space):
| Space | Surface Region | Boundary Line |
|---|---|---|
| 3D | Patch | Curve / Seam |
| 2D | Panel | Edge |
Multi-View Visual Encoder¶
Following the design of VGGT, the processing pipeline proceeds as follows:
- Each input image is divided into non-overlapping \(16\times16\) patches and embedded as tokens via a DINOv2 backbone.
- Intra-frame self-attention (refining single-view features) and inter-frame self-attention (aggregating cross-view information) are stacked in alternation.
- The final intra-frame and inter-frame outputs are concatenated, and tokens from all frames are flattened to form a sequence \(T_i \in \mathbb{R}^{N_i \times D}\), where \(D=768\).
This alternating attention design progressively integrates local texture cues and global geometric information, adapting to sparse, arbitrarily distributed multi-view inputs.
3D Curve and Patch Prediction (Dual-Path Transformer)¶
This is the core module of ReWeaver, responsible for predicting 3D geometric elements and their topological connectivity.
Input: Visual tokens \(T_i\), learnable patch queries \(Q_p \in \mathbb{R}^{N_p \times D}\) (\(N_p=200\)), and curve queries \(Q_c \in \mathbb{R}^{N_c \times D}\) (\(N_c=70\)). Query counts are set to approximately twice the maximum count observed in training data.
Dual-Path Transformer Architecture (inspired by ComplexGen):
At each layer, the patch path and curve path each perform:
- Intra-group self-attention: elements of the same type exchange information (patch–patch or curve–curve).
- Cross-group cross-attention: context is retrieved from image tokens and elements of the other type.
- LayerNorm + FFN + residual connection.
After multiple layers, refined tokens are obtained: \(T_p \in \mathbb{R}^{N_p \times D}\) and \(T_c \in \mathbb{R}^{N_c \times D}\).
Three decoding heads:
(1) Probability Prediction Head¶
A 3-layer MLP followed by Sigmoid predicts whether each query corresponds to a valid element:
Low-probability elements are filtered by thresholds \(\epsilon_p\) and \(\epsilon_c\), and topology refinement produces binary validity masks \(\boldsymbol{\sigma}_p^{\star}\) and \(\boldsymbol{\sigma}_c^{\star}\).
(2) Geometry Prediction Head (HyperNetwork)¶
A key innovation: rather than directly regressing point coordinates, a HyperNetwork generates the weights of a parameterized mapping MLP.
For curves, the hypernetwork \(f_c^{\text{geo}}\) generates, conditioned on token \(T_c^i\), a 3-layer MLP that maps \([0,1]\) to \(\mathbb{R}^3\):
For patches, the hypernetwork \(f_p^{\text{geo}}\) generates a MLP mapping \([0,1]^2\) to \(\mathbb{R}^3\):
Advantages of the HyperNetwork:
- During training, uniform sampling at arbitrary density is possible without compromising geometric smoothness or continuity.
- At inference, adaptive sampling density is supported — small patches are sampled sparsely, large patches densely — yielding a near-uniform 3D point distribution.
- Each token parameterizes an independent continuous mapping, encoding rich shape information.
(3) Connectivity Prediction Head¶
The connection probability between patch \(i\) and curve \(j\) is predicted as:
A dot product of linear projections followed by Sigmoid yields the adjacency matrix \(\sigma_{pc}\). After thresholding at \(\epsilon_{\text{adj}}\) and topology refinement, the final binary connectivity matrix \(\sigma_{pc}^{\star} \in \{0,1\}^{N_p \times N_c}\) is obtained.
2D Pattern Prediction (Intra-Group Attention Unfolding)¶
Given valid patch/curve tokens and the refined topology, this module "unfolds" the 3D structure into 2D sewing patterns.
Core Idea: Based on \(\sigma_{pc}^{\star}\), each valid patch token and its connected curve tokens are grouped together, and attention is performed within each group.
Specifically:
- Curve tokens within a group first exchange information via self-attention.
- They then attend to the associated patch token via cross-attention.
- After LayerNorm + FFN + residual connection, edge tokens \(T_e\) are obtained.
For each connected curve \(j \in \partial_i\), another hypernetwork generates a MLP mapping 1D parameters to normalized 2D coordinates:
Scale Recovery: Since 2D panels are predicted in a normalized \([0,1]^2\) space, an additional MLP \(f_p^{\text{scale}}\) predicts a scaling factor \(s_i\) from the patch token, which is multiplied by the normalized coordinates to recover physical dimensions.
Geometry Refinement: Since the HyperNetwork does not guarantee perfect alignment of adjacent edge endpoints, a post-processing step enforces edge closure so that panels form closed loops suitable for triangulation and simulation.
Loss & Training¶
Hungarian matching establishes correspondences between predicted elements and ground truth. The total loss comprises three terms:
Geometry Loss (Chamfer Distance):
Chamfer distance is computed between point sets sampled from all parameterized mappings (patch, curve, edge) and the corresponding ground truth.
Classification and Connectivity Loss (BCE):
Scale Loss (\(\ell_2\)):
Key Experimental Results¶
Dataset: GCD-TS¶
Extended from GarmentCodeData (GCD), with the following key improvements:
- Default textures in GCD, which contain strong seam cues, are replaced with approximately 50 BEDLAM body textures and a large collection of tileable garment textures.
- Each garment–body pair is rendered from four viewpoints (front/back/left/right) with small-scale camera pose perturbations.
- A total of approximately 100,000 textured multi-view samples covering a wide range of complex geometry and topology.
Main Results¶
| Metric | AIpparel-MV | ReWeaver | Note |
|---|---|---|---|
| \(\text{Acc}_p\) ↑ Panel count accuracy | 0.4561 | 0.8923 | ReWeaver +43.6% |
| \(\text{Acc}_e\) ↑ Edge count accuracy | 0.6774 | 0.6570 | Comparable |
| \(\text{Acc}_o\) ↑ Overall topology accuracy | 0.3090 | 0.5863 | ReWeaver +27.7% |
| \(\text{CD}_e\) ↓ 2D edge Chamfer distance | 0.0648 | 0.0395 | More accurate geometry |
| IoU ↑ Panel intersection over union | 0.7084 | 0.8080 | +10.0% |
ReWeaver significantly outperforms multi-view-augmented AIpparel (AIpparel-MV) on 5 of 6 metrics. The panel count accuracy jumps from 45.6% to 89.2%, demonstrating substantially improved reliability in identifying garment topology.
Ablation Study¶
Effect of Topology and Geometry Refinement
| Configuration | \(\text{CD}_p^{\text{base}}\) ↓ | \(\text{CD}_p^{\text{adapt}}\) ↓ | \(\text{CD}_c\) ↓ | \(\text{Acc}_p\) ↑ | \(\text{Acc}_e\) ↑ | \(\text{Acc}_o\) ↑ | \(\text{CD}_e\) ↓ | IoU ↑ |
|---|---|---|---|---|---|---|---|---|
| With refinement | 0.0225 | 0.0187 | 0.0264 | 0.8923 | 0.6570 | 0.5863 | 0.0395 | 0.8080 |
| Without refinement | 0.0225 | 0.0188 | 0.0255 | 0.9101 | 0.5361 | 0.4880 | 0.0416 | 0.7775 |
Key Findings:
- Topology refinement removes redundant/duplicate edges, improving edge count accuracy (\(\text{Acc}_e\)) from 53.6% to 65.7% (+12.1%) and overall accuracy from 48.8% to 58.6% (+9.8%).
- Geometry refinement closes small gaps between edges in 2D space, producing fully closed panel boundaries and improving IoU from 77.8% to 80.8%.
- Refinement has minimal impact on 3D geometric metrics (\(\text{CD}_p\) and \(\text{CD}_c\) are nearly unchanged), indicating it primarily improves topological consistency and 2D panel quality.
- \(\text{Acc}_p\) is marginally higher without refinement (0.91 vs. 0.89), as refinement occasionally removes valid elements deemed redundant; however, overall topological quality still improves substantially.
Adaptive Sampling¶
- Training uses a fixed \(20\times20\) patch sampling density.
- At inference, a \(20\times20\) grid is first pre-sampled; points are then adaptively retained based on spatial variance. Dense points are pruned for small patches while large patches retain dense sampling.
- Adaptive sampling reduces \(\text{CD}_p^{\text{adapt}}\) from 0.0225 to 0.0187, confirming the effectiveness of the adaptive strategy.
Highlights & Insights¶
- ⭐ First joint reconstruction: Simultaneously outputs 3D garment geometry and 2D sewing patterns while maintaining explicit 2D–3D correspondences, making outputs directly usable in physical simulation.
- ⭐ HyperNetwork parameterization: HyperNetworks generate continuous parameterized mappings that support arbitrary and adaptive sampling at inference, achieving both flexibility and geometric smoothness.
- ⭐ Dual-path Transformer: The patch/curve dual-path design with self-attention and cross-attention effectively fuses multi-view image evidence with structural geometric constraints.
- ⭐ GCD-TS dataset: 100K-scale dataset that resolves texture leakage of seam information present in the original GCD, improving generalization.
- ⭐ Panel count accuracy of 89.2% far exceeds the baseline of 45.6%, demonstrating strong topology generalization.
Limitations & Future Work¶
- High-quality 3D garment data with complex topology and photorealistic textures remains scarce; experiments exhibit a notable sim-to-real gap.
- Edge count accuracy (\(\text{Acc}_e = 0.657\)) lags behind panel count accuracy, indicating substantial room for improvement in fine-grained topology prediction.
- Quantitative evaluation on real-world images is absent; validation is conducted exclusively on synthetic data.
- Geometry refinement relies on post-processing heuristics and does not guarantee 100% successful closure.
- The input is fixed at four standard viewpoints (front/back/left/right); performance under more freely distributed capture conditions has not been thoroughly validated.
Rating¶
- Novelty: ⭐⭐⭐⭐ — First to jointly model 3D geometry, 2D sewing patterns, and topological connectivity; the HyperNetwork parameterization is an elegant design.
- Experimental Thoroughness: ⭐⭐⭐ — Comprehensive validation on synthetic data, but real-image evaluation is absent and only one baseline is compared.
- Writing Quality: ⭐⭐⭐⭐ — Terminology is clearly defined; the dual 2D/3D space description is well-articulated; the framework diagram is intuitive.
- Value: ⭐⭐⭐⭐ — Directly outputs simulation-ready assets, with practical value for digital humans, virtual try-on, and robotic manipulation.