ReWeaver: Towards Simulation-Ready and Topology-Accurate Garment Reconstruction¶
Conference: CVPR2026
arXiv: 2601.16672
Authors: Ming Li, Hui Shan, Kai Zheng, Chentao Shen, Siyu Liu, Yanwei Fu, Zhen Chen, Xiangru Huang
Institute: Zhejiang University, Shanghai Institute for Advanced Study, Westlake University, Fudan University, Adobe, Xidian University
Code: To be confirmed
Area: 3D Vision
Keywords: Garment Reconstruction, Sewing Patterns, Topology Reconstruction, Multi-view Reconstruction, Physical Simulation
TL;DR¶
The ReWeaver framework is proposed to jointly reconstruct 3D garment geometry and 2D sewing patterns from a minimum of 4 multi-view RGB images. By employing a dual-path Transformer to predict 3D surface patches/curves and their topological connections, followed by in-group attention to flatten 3D structures into 2D panel edges, it achieves the first topology-accurate garment asset recovery ready for direct physical simulation.
Background & Motivation¶
High-quality 3D garment reconstruction is critical for applications like virtual try-on, digital humans, gaming, and robotic manipulation. However, existing methods face two primary challenges:
-
Limitations of Unstructured Representations: Current methods (point clouds, SDF, 3D Gaussian Splatting, etc.) can approximate garment geometry but lack explicit sewing structures (seams/panels). This makes them difficult to use directly for physical simulation, garment editing, or retargeting, as these representations are inherently incompatible with industry-standard design workflows centered on 2D sewing patterns.
-
Deficiencies in Existing Sewing Pattern Methods:
- Methods relying on predefined topologies (e.g., DiffAvatar) are limited to simple garments and cannot handle unseen styles.
- Vision-language models (e.g., ChatGarment, AIpparel) generate 2D patterns via tokenized JSON descriptions; while they offer better topological generalization, they lack geometric precision.
- Most methods focus solely on 2D patterns, ignoring accurate 3D geometric understanding.
Goal: Simultaneously reconstruct accurate garment topology (which panels/seams are connected) and geometry (precise 3D shapes of each element), ensuring the output is suitable for both 3D perception and high-fidelity physical simulation.
Method¶
Overall Architecture¶
The core question ReWeaver addresses is: Given only four photos (front, back, left, right), can we recover accurate 3D geometry while simultaneously outputting the 2D sewing patterns required by industrial workflows, with a one-to-one correspondence between seams and panels for direct physical simulation? It integrates these tasks through an encoder-decoder framework: a multi-view encoder fuses sparse images into unified features, a dual-path Transformer decodes 3D surface patches, seams, and their topological relationships from these features, and a flattening module unfolds the 3D structures into 2D panel edges based on the topology, with post-processing ensuring panel closure.
A key prerequisite for understanding the subsequent sections is the correspondence between 2D and 3D space entities:
| Space | Surface Region | Boundary Line |
|---|---|---|
| 3D | Patch | Curve (Seam) |
| 2D | Panel | Edge |
Essentially, a 3D Patch is flattened into a 2D Panel, and a 3D Curve joining two patches corresponds to a 2D Edge—the entire pipeline maintains this 2D↔3D mapping.
%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400}}}%%
flowchart TD
A["4 Multi-view RGB Images<br/>(Front / Back / Left / Right)"] --> B["Multi-view Visual Encoder<br/>Alternating Intra/Inter-frame Attention"]
B --> C["Dual-path Transformer<br/>Patch + Curve Queries<br/>In-group Self-attention + Cross-attention<br/>Probability Head filters elements"]
C --> D["HyperNetwork Geo-Heads<br/>Continuous Mappings per element<br/>→ 3D Patches / Curves"]
C --> E["Connectivity Head<br/>Patch-Curve Adjacency Matrix<br/>→ Topology Connections"]
D --> F["2D Pattern Flattening<br/>In-group Attention via Topology<br/>Maps 3D structures to 2D edges"]
E --> F
F --> G["Geometry Refinement<br/>Enforcing Edge Closure for Panels"]
G --> H["Sewable & Simulation-Ready Garment Assets"]
Key Designs¶
1. Multi-view Visual Encoder: Fusing Sparse Images via Alternating Attention
The first challenge in garment reconstruction is the limited, sparse input from non-fixed viewpoints. Both local textures from single views and global geometry from cross-view consistency must be utilized. ReWeaver follows the VGGT approach: each image is divided into \(16\times16\) non-overlapping patches, embedded as tokens via a DINOv2 backbone. It then alternates between intra-frame self-attention (refining single-view textures) and inter-frame self-attention (aligning and aggregating cross-view geometry). This iterative refinement is robust to sparse inputs. Finally, tokens from all frames are concatenated and flattened into a sequence \(T_i \in \mathbb{R}^{N_i \times D}\) (\(D=768\)) for decoding.
2. Dual-path Transformer: Separating and Interacting Element Decoders
3D Patches and Curves have distinct properties; decoding them together causes mutual interference. Inspired by ComplexGen, ReWeaver allocates separate paths: learnable patch queries \(Q_p \in \mathbb{R}^{N_p \times D}\) (\(N_p=200\)) and curve queries \(Q_c \in \mathbb{R}^{N_c \times D}\) (\(N_c=70\)). Within each layer, each path performs in-group self-attention (internal communication among patches or curves), followed by cross-group cross-attention (extracting context from image tokens and the other element type), and finally LayerNorm + FFN + Residuals. This yields refined \(T_p\) and \(T_c\). A probability head identifies which queries hit valid elements:
Queries below thresholds \(\epsilon_p, \epsilon_c\) are filtered, and topological refinement produces binary masks \(\boldsymbol{\sigma}_p^{\star}, \boldsymbol{\sigma}_c^{\star}\), selecting true patches and curves from the candidates.
3. HyperNetwork Geometry Prediction Head: Continuous Mappings via HyperNetworks
Directly regressing fixed-count point coordinates ties sampling density to training parameters. ReWeaver utilizes a HyperNetwork: each token outputs weights for a small MLP that continuously maps parameter domains to 3D space. The curve head \(f_c^{\text{geo}}\) generates a 3-layer MLP mapping \([0,1] \to \mathbb{R}^3\), and the patch head \(f_p^{\text{geo}}\) generates an MLP mapping \([0,1]^2 \to \mathbb{R}^3\):
Since geometry is represented as a continuous function, sampling can occur at any density. During inference, sampling is demand-driven (sparse for small patches, dense for large ones), ensuring uniform 3D point distributions.
4. Connectivity Prediction Head: Explicitly Predicting "Who is Sewn to Whom"
For sewing patterns, the connections between curves and patches are essential. ReWeaver performs linear projections on patch and curve tokens followed by a dot product and Sigmoid to predict connectivity probabilities:
The adjacency matrix \(\sigma_{pc}\) is thresholded and refined into a binary matrix \(\sigma_{pc}^{\star} \in \{0,1\}^{N_p \times N_c}\), explicitly defining which curves form each patch's boundary.
5. 2D Pattern Flattening: Structure-Aware Group Attention
Based on \(\sigma_{pc}^{\star}\), each valid patch token and its associated curve tokens are grouped. Within the group, curve tokens undergo self-attention and cross-attention with the patch token to produce edge tokens \(T_e\). For each connected curve \(j \in \partial_i\), a HyperNetwork generates an MLP mapping 1D parameters to normalized 2D coordinates:
Scale factors \(s_i\) are regressed from patch tokens to restore physical dimensions. Finally, a geometry refinement step enforces edge closure to create closed panel cycles for triangulation and simulation.
Loss & Training¶
A correspondence between predicted and ground-truth elements is established using Hungarian matching. The total loss includes:
Geometry Loss (Chamfer Distance):
Classification & Connectivity Loss (BCE):
Scale Loss (\(\ell_2\)):
Key Experimental Results¶
Main Results¶
| Metric | AIpparel-MV | ReWeaver | Description |
|---|---|---|---|
| \(\text{Acc}_p\) ↑ Panel count accuracy | 0.4561 | 0.8923 | ReWeaver +43.6% |
| \(\text{Acc}_e\) ↑ Edge count accuracy | 0.6774 | 0.6570 | Comparable |
| \(\text{Acc}_o\) ↑ Overall topology accuracy | 0.3090 | 0.5863 | ReWeaver +27.7% |
| \(\text{CD}_e\) ↓ 2D edge Chamfer dist. | 0.0648 | 0.0395 | More precise geometry |
| IoU ↑ Panel IoU | 0.7084 | 0.8080 | +10.0% gain |
ReWeaver significantly outperforms the multi-view enhanced AIpparel (AIpparel-MV) in 5 out of 6 metrics, particularly in panel count accuracy, which jumps from 45.6% to 89.2%.
Ablation Study¶
| Configuration | \(\text{CD}_p^{\text{adapt}}\) ↓ | \(\text{Acc}_p\) ↑ | \(\text{Acc}_o\) ↑ | IoU ↑ |
|---|---|---|---|---|
| Full (with refinement) | 0.0187 | 0.8923 | 0.5863 | 0.8080 |
| Without refinement | 0.0188 | 0.9101 | 0.4880 | 0.7775 |
Key Findings: * Topology refinement removes redundant edges, increasing overall topology accuracy (\(Acc_o\)) from 48.8% to 58.6%. * Geometry refinement closes gaps between 2D edges, increasing IoU from 77.8% to 80.8%. * Refinement has negligible impact on 3D geometry (Chamfer distance), showing its primary value is in topological consistency. * Adaptive sampling based on spatial variance reduces prediction error (\(\text{CD}_p\)) from 0.0225 to 0.0187.
Highlights & Insights¶
- ⭐ First Joint Reconstruction: Simultaneously outputs 3D garment geometry and 2D sewing patterns with explicit 2D-3D correspondence for direct simulation.
- ⭐ HyperNetwork Parameterization: Uses continuous mappings for flexible, demand-driven adaptive sampling and geometric smoothness.
- ⭐ Dual-path Transformer: Effectively fuses multi-view image evidence with structural geometric constraints.
- ⭐ GCD-TS Dataset: A 100k-scale dataset that fixes texture "leaks" of seam information in original GCD.
Limitations & Future Work¶
- High-quality complex topology data remains scarce, leading to a visible sim-to-real gap.
- Fine-grained topology prediction (\(\text{Acc}_e = 0.657\)) has significant room for improvement compared to panel detection.
- Lacks quantitative evaluation on real-world imagery.
- Geometry refinement relies on heuristic rules and does not guarantee 100% closure success.
Rating¶
- Novelty: ⭐⭐⭐⭐
- Experimental Thoroughness: ⭐⭐⭐
- Writing Quality: ⭐⭐⭐⭐
- Value: ⭐⭐⭐⭐