NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling¶
Conference: AAAI 2026 arXiv: 2511.06194 Code: Coming soon Area: 3D Vision Keywords: Text-to-CAD, NURBS, LLM, BRep, 3D Generation
TL;DR¶
NURBGen is the first text-to-CAD generation framework based on NURBS surface representation. By fine-tuning an LLM, it maps natural language descriptions to structured NURBS parameter JSONs. A hybrid representation (untrimmed NURBS + analytic primitives) and a large-scale partABC dataset are introduced, achieving significant improvements over existing methods in geometric fidelity and dimensional accuracy.
Background & Motivation¶
Problem Definition¶
CAD modeling is essential in modern engineering and product design, yet creating detailed CAD models typically requires expertise in professional software (e.g., Onshape, AutoCAD) and is highly time-consuming. Text-to-CAD technology aims to enable designers to describe 3D objects in natural language without requiring professional modeling skills.
Limitations of Prior Work¶
Design-history dependency: Nearly all existing methods (e.g., DeepCAD, Text2CAD, CAD-LLaMA) rely on design-history-based representations, where shapes are constructed via sequences of parametric operations (extrusion, 2D sketches). Although intuitive and editable, the training datasets (e.g., DeepCAD-170k) are small in scale and low in complexity (mostly cuboids and cylinders), limiting generalization capability.
Underutilization of the ABC dataset: The ABC dataset contains over one million 3D CAD models, but has two key limitations: (a) models are stored in BRep (Boundary Representation) format without design history; (b) high-quality text descriptions are absent.
Difficulty of NURBS modeling: Analytic surfaces in BRep are most commonly represented as NURBS, yet NURBS have rarely been explored in deep generative research due to challenges including efficient representation, non-differentiability of knot vectors, high parametric variability, and trimming complexity.
Paper Goals¶
- Treat NURBS surfaces as language-aligned objects, encoding each surface as a JSON token sequence containing control points, degrees, weights, and knot vectors.
- Reformulate text-to-CAD as a language modeling task.
- Leverage the large scale and geometric diversity of the ABC dataset.
Method¶
Overall Architecture¶
The NURBGen pipeline proceeds as follows: (1) extract part-level CAD models from the ABC dataset → (2) encode each part in a hybrid format (NURBS + analytic primitives) and generate high-quality descriptions using a VLM → (3) fine-tune Qwen3-4B to map text descriptions to structured hybrid CAD representations → (4) output JSON is directly convertible to BRep models.
Key Designs¶
1. CAD Representation (NURBS Parameter Extraction)¶
- Normalization: Geometry is normalized into a \(2 \times 2 \times 2\) bounding box centered at the origin.
- NURBS conversion: Each face is converted to an untrimmed NURBS representation using PythonOCC's
BRepBuilderAPI_NurbsConvert, unifying all underlying surfaces. - Parameter extraction: For each face, the following are extracted: control points (poles), knot vectors in both parametric directions, knot multiplicities, degrees in the u and v directions, rational weights, and periodicity flags.
- Exact reconstruction: Given these parameters, the original surface can be exactly reconstructed via the
Geom_BSplineSurfaceconstructor.
The mathematical definition of a NURBS surface is:
2. Hybrid Representation¶
This is one of the core innovations. Not all surfaces can be robustly represented as untrimmed NURBS — thin regions near holes or fillets in particular frequently introduce geometric artifacts.
- Degeneracy detection: The Chamfer Distance between each reconstructed surface \(f_n\) and the ground-truth surface \(f_{gt}\) is compared: \(CD(f_n, f_{gt}) \leq \epsilon\), with threshold \(\epsilon = 6 \times 10^{-4}\).
- Fallback strategy: When NURBS approximation is unacceptable, the original analytic primitive is retained (lines, circles, B-splines, ellipses, parabolas, hyperbolas).
- Statistics: In practice, approximately 70% of faces are modeled with NURBS, while 30% fall back to analytic primitives.
- Advantages: More expressive and compact than pure NURBS format, reducing parameter count and producing shorter, more token-efficient inputs.
3. Automatic Annotation Pipeline¶
This addresses the lack of text descriptions in the ABC dataset:
- Multi-view rendering: Each BRep is first converted to a triangular mesh, then rendered in Blender from 6 viewpoints at 512×512 resolution with the Freestyle renderer enabled to overlay contour edges.
- Metadata-guided annotation: Geometric metadata inaccessible to VLMs is extracted — bounding box dimensions, surface area, volume, and topological hole count (genus computed via the Euler–Poincaré formula: \(g = 0.5 \times (2 - \chi)\)).
- Description generation: InternVL3-13B, a multi-view VLM, takes 6 rendered views and metadata-augmented prompts as input to generate shape-centric descriptions.
- Quality validation: GPT-4o validates a random sample of 1,000 instances, achieving approximately 85% accuracy.
Dataset Construction (partABC)¶
- Part-level substructures are extracted from 200k models in ABC, yielding 3 million part-level CAD instances.
- Complexity filtering: A weighted scoring function is applied: \(w(B) = l_1 \times \text{token\_count} + l_2 \times \text{through\_holes} + l_3 \times \frac{\text{surface\_area}}{\text{volume}} + l_4 \times \text{bbox\_diag}\)
- Parts are categorized into three tiers: simple (≤0.12), moderate (0.12–0.23), and complex (>0.23).
- A subset of 10% simple + 50% moderate + 40% complex instances is retained, yielding approximately 300k high-quality samples.
Loss & Training¶
- Base model: Qwen3-4B
- Optimizer: AdamW, learning rate \(5 \times 10^{-5}\), linear warmup
- LoRA: rank=64, \(\alpha=128\)
- Training: 180k steps, batch size=1, 4×H200 GPUs, 3 days
- Context window: 8192 for training, 14k for inference
- Temperature: 0.3
- Generation speed: ~800 tokens/s on RTX 3090
- Data processing: Control point coordinates are retained to 6 decimal places; weights are compressed using (value, frequency) encoding.
Key Experimental Results¶
Main Results¶
| Model | User Pref. (1k)↑ | GPT Pref.↑ | IR↓ | CD↓ | HD↓ | JSD↓ | MMD↓ |
|---|---|---|---|---|---|---|---|
| GPT-4o | 1.5 | 1.9 | 0.17 | 7.2 | 0.36 | 72.87 | 4.17 |
| DeepCAD | 5.6 | 6.1 | 0.32 | 10.28 | 0.45 | 89.77 | 4.43 |
| Text2CAD | 26.1 | 27.2 | 0.05 | 9.66 | 0.42 | 85.27 | 4.54 |
| NURBGen | 64.1 | 61.6 | 0.018 | 4.43 | 0.25 | 57.94 | 2.14 |
Note: CD, JSD, and MMD are multiplied by \(10^2\). NURBGen substantially outperforms all baselines across every metric.
Ablation Study¶
| Configuration | Human Pref.↑ | GPT-4o Pref.↑ | Notes |
|---|---|---|---|
| NURBS-only | 28% | 21% | Untrimmed NURBS only, no analytic primitive fallback |
| Hybrid (full) | 72% | 79% | NURBS + analytic primitives |
The NURBS-only model exhibits pronounced geometric artifacts and reconstruction errors near holes, sharp transitions, and regions where NURBS fitting is imprecise.
Key Findings¶
- NURBGen achieves a CD of 4.43 (×\(10^2\)) on 7,500 test samples, 54% lower than the second-best method Text2CAD (9.66).
- A top-1 human preference rate of 64.1% is achieved, far exceeding Text2CAD's 26.1%.
- An invalidity rate of only 0.018 demonstrates strong geometric correctness of the generated BRep models.
- The hybrid representation improves human preference by 44 percentage points over pure NURBS.
Highlights & Insights¶
- NURBS as language: Serializing NURBS surface parameters as JSON tokens elegantly reformulates CAD generation as a language modeling task — a significant paradigm shift.
- Practical utility of hybrid representation: The 70% NURBS + 30% analytic primitive strategy achieves a favorable balance between robustness and token efficiency.
- Bottom-up data engineering: The complete pipeline — from part extraction to complexity filtering to automatic annotation — enables effective utilization of the large-scale, unannotated ABC dataset.
- Extremely low invalidity rate: An IR of 0.018 indicates strong geometric consistency in the structured parameters generated by the LLM.
Limitations & Future Work¶
- Complex prompts: For prompts involving complex descriptions (e.g., "a two-story house with a gable roof"), NURBGen struggles to capture fine-grained structural details.
- Geometric artifacts: Self-intersections or topological inconsistencies arise in a minority of cases.
- Engraved text: Prompts involving engraved text cannot be reconstructed.
- Context window limitation: Current training is limited to 8,192 tokens; future work may explore long-context training to handle more complex assemblies.
- Only 200k of the 1 million models in the ABC dataset have been processed; scaling to the full dataset remains as future work.
Related Work & Insights¶
- Distinction from NeuroNURBS: NeuroNURBS employs a non-autoregressive transformer VAE to learn latent encodings of untrimmed NURBS, but does not support language-conditioned generation and cannot handle trimming.
- Comparison with LLaMA-Mesh: LLaMA-Mesh fine-tunes LLaMA to generate mesh vertices and faces as plain text, whereas NURBGen generates structured, editable NURBS parameters.
- Insight: Structured symbolic representations (vs. latent encodings) may be a more promising direction for LLM-driven 3D generation.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First NURBS-based text-to-CAD framework with an elegantly designed hybrid representation.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Multi-metric evaluation combined with human assessment, though ablation studies are limited in scope.
- Writing Quality: ⭐⭐⭐⭐ — Clear structure with complete technical details.
- Value: ⭐⭐⭐⭐⭐ — Establishes NURBS as a viable alternative to design-history-based methods; the partABC dataset represents a substantial contribution.