Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models¶

Conference: ACL 2025
arXiv: 2505.19490
Code: https://jianxliao.github.io/cadllm-page/
Area: LLM/NLP
Keywords: CAD modeling, text-to-CAD, Transformer, LLM, computer-automated design

TL;DR¶

This paper proposes a framework for automatically generating CAD modeling sequences from text descriptions, comprising a semi-automated annotation pipeline, a dual-channel Transformer generator TCADGen, and an LLM enhancement module CADLLM. It improves CAD command accuracy from 84% to 96.6% and reduces Chamfer Distance from 120.99 to 3.12.

Background & Motivation¶

Background: CAD (Computer-Aided Design) is a core tool in industrial design and additive manufacturing. Under the trend of combining large models with CAD for Computer-Automated Design (CAutoD), several methods have emerged, such as DeepCAD (Transformer-based CAD sequence generation), Text2CAD (text-to-CAD), BlenderLLM, and Query2CAD (direct CAD command generation using LLMs).

Limitations of Prior Work: 1) CAD data directly generated by LLMs lacks precision and parameter validation, making it difficult to generate high-quality, editable CAD models, where precision is paramount in industrial design; 2) LLMs suffer from high computational overhead and low efficiency; 3) Designers struggle to guide LLMs effectively using simple language to generate reasonable CAD models; 4) Existing CAD datasets lack natural language description annotations.

Key Challenge: LLMs possess powerful generation and reasoning capabilities but lack precision when applied directly to precision CAD tasks; dedicated Transformer models achieve better accuracy on specific tasks but lack reasoning and error correction capabilities. The complementary strengths of these two paradigms have not been effectively combined previously.

Goal: 1) To efficiently generate high-quality text annotations for CAD datasets; 2) To accurately generate CAD modeling command sequences from text descriptions; 3) To leverage the reasoning capabilities of LLMs to correct and improve the quality of generated sequences.

Key Insight: Combining a fine-tuned small Transformer model (TCADGen) with an LLM (CADLLM), where the small model handles initial sequence generation and confidence evaluation, and the LLM utilizes the confidence info to focus on correcting and optimizing low-confidence commands, rather than generating from scratch.

Core Idea: Utilizing a dual-channel Transformer to generate the initial CAD sequence along with command-wise confidence scores, followed by a fine-tuned LLM that performs targeted error correction based on the confidence information, thereby achieving automated generation from text to high-precision CAD models.

Method¶

Overall Architecture¶

The input consists of the appearance description \(T_{\text{appear}}\) and parameter modeling description \(T_{\text{param}}\) of the CAD model. The workflow consists of two stages: 1) TCADGen generates an initial CAD command sequence \(\mathbf{M}\) and confidence scores for each command \(\mathbf{S} = \{(s_i^{\text{cmd}}, s_i^{\text{args}})\}\); 2) CADLLM receives the sequence, confidence scores, and raw descriptions to output the final corrected sequence \(\mathbf{M}^*\).

Key Designs¶

LLM-Based Semi-Automated Annotation Pipeline:
- Function: To generate high-quality appearance and parameter description annotations for large-scale CAD datasets.
- Mechanism: Appearance description workflow: Sample multi-view images of the CAD model -> VLLM (Llama-3.2-11B-Vision) generates descriptions + PointLLM generates descriptions from point clouds -> LLM checks the consistency between them (auto pass rate of 98.4%) -> A small number of inconsistent samples are manually annotated. Parameter description workflow: Feed labeled CCS parameters into the LLM to generate step-by-step descriptions according to a template -> Validate via backward verification (the LLM reconstructs CCS from the description and calculates \(\text{LCS}_{\text{ratio}} = \frac{\text{len}(\text{LCS}(g,r))}{\text{len}(g)}\)) -> Samples scoring below the 0.9 threshold enter a reflection-optimization loop (up to two rounds), resulting in optimized \(\text{LCS}_{\text{ratio}}\) concentrated near 1.0.
- Design Motivation: Fully manual annotation is highly expensive. A semi-automated approach substantially reduces annotation costs while maintaining high quality, and the backward verification strategy provides an elegant, labor-free quality control mechanism.
TCADGen (Dual-Channel Transformer CAD Generator):
- Function: To transform text descriptions into CAD command sequences (CCS) and output command-wise confidence scores.
- Mechanism: The two channels encode parameter descriptions and appearance descriptions respectively using DeBERTa-Large-v3, and map them to a shared \(d\)-dimensional semantic space via linear projection; an adaptive feature fusion based on a dynamic routing mechanism inspired by capsule networks is performed: \(\mathbf{s}_j = \sum_i \text{softmax}(\mathbf{W}_r[\hat{\mathbf{h}}_p^i; \hat{\mathbf{h}}_a^j])\hat{\mathbf{h}}p^i\mathbf{W}{ij}\). After fusion, a multi-head attention + bidirectional LSTM decoder is used to predict the complete CAD sequence, command types, and parameter confidence scores in parallel.
- Design Motivation: The dual-channel architecture preserves the independence of parameter and appearance features, preventing information loss from early fusion. The confidence output serves as the basis for precise error correction in CADLLM.
CADLLM (LLM-Enhanced CCS Generation):
- Function: To leverage the reasoning capabilities of LLMs to correct low-confidence commands from TCADGen, producing the final high-precision CCS.
- Mechanism: Conduct SFT based on Llama-3.2-3B-Instruct. During training, the predicted CCS from TCADGen combined with confidence scores is used as input, and the ground-truth CCS is used as the label, enabling the model to learn the "confidence-error" mapping. During inference, CADLLM receives the user description + TCADGen's initial output + confidence scores, focusing on and fixing low-confidence segments. The optimal performance-cost trade-off is achieved with just 1000 training samples.
- Design Motivation: Direct text-to-CCS generation by LLMs has low precision (only 32.8% for Llama 3B), but given the initial sequence and confidence scores, the LLM can focus on error correction rather than generating from scratch, resulting in an immense boost in performance.

Loss & Training¶

TCADGen Training: Standard sequence prediction loss (cross-entropy for command type classification + regression loss for parameters).
CADLLM Training: SFT loss, taking the output of TCADGen as query and the ground-truth CCS as response.
Hyperparameters: Increasing CADLLM training data from 0 to 1000 samples surges accuracy from 16.0% to 86.4%, with performance gains plateauing after 500 samples.

Key Experimental Results¶

Main Results¶

Model	Command Accuracy	Avg F1	Avg AUC
DeepCAD	0.571	0.606	0.747
Text2CAD	0.840	0.722	0.819
TCADGen	0.890	0.771	0.854
TCADGen+CADLLM	0.966	0.947	0.962

Method	CD ↓	MMD ↓	JSD ↓
DeepCAD	169.93	31.91	45.03
Text2CAD	142.83	28.98	40.23
TCADGen	120.99	21.36	35.25
CADFusion (LLaMA-8B)	45.67	3.49	17.11
TCADGen+CADLLM (LLaMA-3B)	3.12	2.78	8.38

Ablation Study¶

Configuration	Command Accuracy	Description
TCADGen (full)	0.890	Full dual-channel model
BERT fine-tuned w/o dual-channel	0.807	Drops by 8.3% without the dual-channel architecture
Dual-channel w/o BERT fine-tuning	0.847	Drops by 4.3% without BERT fine-tuning
TCADGen (Text2CAD dataset)	0.804	Drops by 8.6% using the legacy dataset
GPT-4o prompt + TCADGen	0.670 accuracy	Direct LLM prompting is far inferior to SFT
CADLLM + TCADGen (SFT)	0.864 accuracy	Fine-tuning easily outperforms prompting

Key Findings¶

TCADGen improves command accuracy by 31.8 percentage points over DeepCAD and by 5 percentage points over Text2CAD.
The addition of CADLLM elevates accuracy from 0.890 to 0.966 and dramatically drops CD from 120.99 to 3.12.
Direct text-to-CCS generation by the LLM performs poorly (Llama 3B: 32.8%), yet error correction based on TCADGen outputs yields excellent results (86.4%).
Utilizing a smaller Llama-3.2-3B model outperforms the LLaMA-8B-based CADFusion method (CD: 3.12 vs. 45.67), proving that confidence-guided refinement is far more efficient than blind rewriting.
The quality of the dataset from the semi-automated annotation has a significant impact on TCADGen's performance (0.804 on legacy Text2CAD data vs. 0.890 on the newly annotated data).

Highlights & Insights¶

Complementary Small Model + LLM Architecture: TCADGen provides precise initial predictions and confidence estimation, while CADLLM leverages reasoning abilities to execute targeted error corrections, achieving a clear division of labor. This paradigm of "specialized model generation + LLM correction" can be generalized to any field requiring high-precision structured outputs.
Confidence-Guided Error Correction Mechanism: Instead of making the LLM blindly rewrite the whole sequence, the confidence scores inform it which segments are likely erroneous, vastly boosting the target-awareness and accuracy of the correction.
Backward Verification Annotation Strategy: It is highly practical. Verifying description quality by asking an LLM to reversely reconstruct the original data from its generated description is a low-cost, fully-automated quality control method that can be adapted to other annotation scenarios.

Limitations & Future Work¶

The semi-automated annotation still requires a large number of LLM calls, and its scalability to larger-scale datasets remains to be verified.
Imbalanced distribution of commands in the training data (e.g., significantly more Line than Arc commands) affects the robustness of minority command categories.
The framework does not explicitly introduce geometric constraints or structural reasoning, which may result in syntactically correct but geometrically implausible sequences.
It is only applicable to the detailed CAD design stage and does not support the conceptual design stage where parameter descriptions are incomplete.
The evaluation dataset scale is limited, and validation against real-world industrial demands is still lacking.

vs DeepCAD: DeepCAD generates sketches followed by extrusion parameters, making sequences prone to breakage and only supporting basic operations. TCADGen's parallel prediction and dual-channel fusion avoid this bottleneck.
vs Text2CAD: Text2CAD generates from textual and visual features but fails to leverage LLM reasoning for error correction. CADLLM fills this gap and dominates across all metrics.
vs BlenderLLM / Query2CAD: Pure LLM-based approaches tend to omit steps or violate geometric constraints in complex tasks; the hybrid architecture in this work secures baseline precision through a dedicated model.
vs CADFusion: As a peer LLM-enhanced method, CADFusion achieves a CD of 45.67 using an 8B model. This work achieves a CD of 3.12 using a 3B model, demonstrating that confidence-guided correction is far superior to guide-free direct generation.

Rating¶

Novelty: ⭐⭐⭐⭐ The two-stage architecture featuring small-model generation followed by LLM-based error correction is highly novel, with confidence guidance serving as a key innovation.
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-dimensional evaluations (command-level and model-level geometric metrics), exhaustive ablations, and comparisons against multiple baselines are provided.
Writing Quality: ⭐⭐⭐ The structure is clear, though some mathematical formulations and descriptions are mildly redundant.
Value: ⭐⭐⭐⭐ Demonstrates direct practical value for industrial CAD automation, and the small model + LLM framework is highly transferable.