CAD-Recode: Reverse Engineering CAD Code from Point Clouds¶

Conference: ICCV2025 arXiv: 2412.14042 Code: filaPro/cad-recode Area: Interpretability Keywords: CAD reverse engineering, point cloud reconstruction, large language models, CadQuery, Python code generation

TL;DR¶

This paper proposes CAD-Recode, which translates point clouds into executable Python CadQuery code to reconstruct CAD models. By leveraging a pretrained LLM (Qwen2-1.5B) as the decoder paired with a lightweight point cloud encoder, the method achieves more than 10× reduction in Chamfer Distance on three benchmarks: DeepCAD, Fusion360, and CC3D.

Background & Motivation¶

The core problem in CAD reverse engineering is: given a 3D representation (e.g., a point cloud), recover the parametric sketch and CAD operation sequence that generated the model. Existing methods (e.g., DeepCAD, PrismCAD, TransCAD) typically represent CAD sequences as closed, finite token vocabularies, which suffer from the following limitations:

Limited representational capacity: Custom token vocabularies struggle to cover the rich operations in real-world CAD modeling and are not directly executable.

Lack of interpretability: Output token sequences are not human-friendly and are difficult to edit or reuse.

Training data bottleneck: Most methods rely on DeepCAD's 160k training samples, which lack diversity.

Complex network design: Specialized encoder-decoder architectures are required, making it difficult to leverage pretrained models.

The motivation behind CAD-Recode is: since LLMs have been extensively exposed to Python code during pretraining, why not represent CAD sequences directly as Python code and let the LLM "translate" point clouds? This reformulates CAD reconstruction as a conditional code generation problem.

Method¶

1. CAD Representation: Python CadQuery Code¶

One of the core innovations of CAD-Recode is representing CAD models as Python code based on the CadQuery library. Unlike the closed token vocabulary in DeepCAD, CadQuery provides:

Low-level sketch primitives: segment, arc, circle
High-level geometric abstractions: rect, box, cylinder
Boolean operations: union, cut, intersect
Extrusion operations: extrude

The generated code is valid Python that can be directly executed to produce 3D models. For example, a table can be represented as:

import cadquery as cq
w0 = cq.Workplane('XY', origin=(0,0,88))
r = w0.sketch().segment((-98,-100),(-77,-99))...extrude(-177).union(...)

2. Network Architecture¶

The CAD-Recode architecture is remarkably simple, adding only a lightweight point cloud encoder on top of a pretrained LLM:

Fourier Point Encoder (FourierPointEncoder): - Input: \(N\) 3D point coordinates \((x, y, z)\) - Each coordinate is first positionally encoded using 8 Fourier frequencies \(f_k = 2^k\) (\(k=0,1,...,7\)) - Produces \(3 + 3 \times 8 \times 2 = 51\) dimensional features (raw coordinates + sin/cos encodings) - Projected to the LLM's hidden_size via a single linear layer - Output point embeddings directly replace the LLM's token embeddings

Decoder (Qwen2-1.5B): - Directly reuses the pretrained Qwen2-1.5B model with its original tokenizer - Point cloud embeddings are prepended to the sequence, distinguished by a special attention_mask (−1 for points, 1 for text) - Uses <|im_start|> as the generation start token and <|endoftext|> as the end token - Maximum generation length of 768 tokens at inference

The entire model introduces only one new linear layer (\(51 \rightarrow \text{hidden\_size}\)), adding negligible parameters.

3. Training Dataset¶

To fully exploit the model's capacity, the authors programmatically generated 1 million CadQuery Python code samples:

Low-level primitives: segment, arc, circle
High-level abstractions: rect, box, cylinder, and other common shapes
Parameter ranges: integer values in \([-50, +50]\) for v1; extended to \([-100, +100]\) for v1.5
Each sample is generated by randomly combining sketch primitives and extrusion operations, ensuring high diversity

4. Training Details¶

v1: - 4 × H100 GPUs, batch size 9, learning rate 1e-4 - Input includes normals: \((x, y, z, n_x, n_y, n_z)\)

v1.5 improvements: - 1 × H100 GPU, batch size 18 + gradient accumulation ×2, learning rate 2e-4 - Normals removed; only \((x, y, z)\) used - Random sampling replaced by Farthest Point Sampling (FPS, from PyTorch3D) - Z-axis sorting removed; unordered points used - Gaussian noise with standard deviation 0.01 added to all points with probability 0.5 (data augmentation)

5. Inference Pipeline¶

Sample 256 points from the input mesh/point cloud (FPS)
Normalize points to a cube centered at the origin with maximum extent 2
Point cloud → Fourier encoding → linear projection → replace LLM token embeddings
Autoregressively generate Python CadQuery code
Execute the generated code to obtain the CAD model (exportable as STEP/STL)

Key Experimental Results¶

Main Comparison (v1.5, trained on the authors' 1M dataset)¶

Method	Training Set	Data Size	DeepCAD Mean CD↓	DeepCAD Med CD↓	DeepCAD IoU↑	Fusion360 Mean CD↓	Fusion360 Med CD↓	Fusion360 IoU↑
DeepCAD	DeepCAD	160k	42.5	9.64	46.7%	89.2	39.9	25.2%
CAD-SIGNet	DeepCAD	160k	3.43	0.28	77.6%	7.37	4.08	70.4%
CAD-Diffuser	DeepCAD	160k	—	3.02	74.3%	—	3.62	63.3%
CAD-Recode	DeepCAD	160k	0.89	0.20	86.2%	1.77	0.30	75.6%
CAD-Recode v1.5	Ours	1M	0.30	0.16	92.0%	0.35	0.15	87.8%

Key Findings¶

Even when trained on the same DeepCAD data (160k), CAD-Recode achieves approximately 4× lower mean CD than the best prior method, CAD-SIGNet.
With the authors' 1M dataset, mean CD further drops from 0.89 to 0.30 on DeepCAD, with IoU improving from 86.2% to 92.0%.
Only 256 input points are required, far fewer than the thousands required by many competing methods.
Invalid Rate (IR) is extremely low: 0.4% on DeepCAD and 0.5% on Fusion360, indicating that generated code is nearly always executable.

LLM Interpretability and Editing¶

Since the output is standard Python code, GPT-4o can directly understand and perform: - CAD editing: modifying dimensions, adding/removing features - CAD question answering: answering questions about 3D shapes on SGP-Bench

Highlights & Insights¶

Minimalist architecture: Achieving SOTA by adding only a single linear layer on top of Qwen2-1.5B demonstrates that pretrained LLMs' code understanding capability transfers directly to CAD code generation.
Python code as CAD representation: This breaks the paradigm of using custom token vocabularies in prior work, making outputs naturally interpretable, editable, and executable.
Programmatic data generation: All 1 million training samples are programmatically generated without manual annotation, and training on them outperforms training on real data.
Successful cross-domain transfer: The model trained solely on synthetic data achieves state-of-the-art results on the real-world CC3D dataset.
Effectiveness of Fourier positional encoding: Simple Fourier features combined with a linear projection suffice to effectively inject point cloud information into the LLM.

Limitations & Future Work¶

Sketch-extrude operations only: The current method only handles sketch + extrude operations and does not support more complex operations such as revolve, sweep, or loft.
Code execution safety: CadQuery has known memory leak issues; generated code may be invalid or cause memory leaks, necessitating execution in isolated processes with timeouts.
Fixed input point count of 256: While this reduces computation, it may be insufficient for capturing fine details of complex shapes.
Gap between synthetic and real CAD data: Despite strong performance on real data, the distribution of programmatically generated samples still differs from actual engineering CAD models.
Sequence length constraint: The 768-token limit restricts the complexity of generatable CAD models.
Only a 1.5B-parameter LLM is used: Scaling to larger LLMs may yield further improvements.

DeepCAD (2021): The first work to model CAD sequences as tokens, but with limited representational capacity.
CAD-SIGNet: Previous SOTA, using signed distance representation.
CAD-Diffuser: A diffusion-model-based CAD reconstruction method.
PrismCAD / Point2Cyl: Other point-cloud-based CAD reconstruction methods.
CAD-MLLM: Unified multimodal conditional CAD generation, also leveraging LLMs.
The paper's paradigm of "reformulating a domain problem as code generation" is worth extending to other structured output tasks.

Rating¶

Novelty: TBD
Experimental Thoroughness: TBD
Writing Quality: TBD
Value: TBD