Graph Diffusion that can Insert and Delete¶

Conference: NeurIPS 2025 arXiv: 2506.15725 Code: None Area: Graph Generation / Molecular Generation Keywords: graph diffusion, node insertion and deletion, molecular generation, property targeting, molecular optimization

TL;DR¶

This paper proposes GrIDDD, the first model to extend discrete denoising diffusion probabilistic models (DDPM) to support dynamic insertion and deletion of graph nodes during generation, allowing molecular graph size to adapt throughout the diffusion process. GrIDDD matches or surpasses existing methods on property targeting and molecular optimization tasks.

Background & Motivation¶

Core Problem¶

Graph-based molecular generation must handle discrete and combinatorially structured connectivity, where chemical constraints render the vast majority of atom–bond combinations invalid. Mainstream approaches include:

Method Type	Representative Work	Strengths	Weaknesses
Autoregressive	You et al., JT-VAE	Naturally handles variable length	Sequential dependency, slow sampling
One-shot	GraphVAE	Parallel sampling	Difficulty learning high-order interactions
Graph Diffusion	DiGress, FreeGress	Permutation-invariant, iterative refinement	Fixed graph size

Limitations of Fixed Size¶

The fundamental constraint of existing graph diffusion models is that sample size remains fixed throughout the entire generation process, which leads to two serious problems:

Property Targeting: When the target property correlates with molecular size (e.g., molecular weight), fixed-size models cannot respond accordingly.

Property Optimization: Optimizing target properties by adding or removing atoms is impossible.

Existing workarounds: - DiGress, E(3) Equivariant Diffusion: sample graph size from the empirical training distribution - FreeGress: use an auxiliary classifier to predict graph size - Lift: sample additional nodes during reverse diffusion, but without incorporating this into training

These are all expedient solutions and do not enable dynamic size adjustment within the diffusion process.

Method¶

Overall Architecture¶

The core innovation of GrIDDD lies in redesigning the forward and reverse diffusion processes to support monotonic node insertion and deletion:

Forward process: progressively adds noise to a clean graph \(G_0\) (with \(n_0\) nodes) until reaching \(G_T\) (with \(n_T\) nodes)
Reverse process: progressively denoises \(G_T\) while adjusting size back to \(G_0\)
When \(n_T > n_0\), deletion is required; when \(n_T < n_0\), insertion is required

Key Designs¶

Node Deletion Mechanism¶

Two special atom types are introduced: DEL and DEL*

DEL: represents a fully deleted node (absorbing state)
DEL*: an intermediate state in which a node has just been marked for deletion and will transition to DEL at the next step

Why are two states necessary? - With only DEL, once a node enters the absorbing state it cannot be restored to a normal atom type during the reverse process - DEL* ensures that during the reverse process, a re-inserted node can transition back to a normal atom type at the next step

Mixed Diffusion Process¶

The special transition matrix takes the form:

\[Q_t^* = \zeta(t)(\alpha_t A^* + (1-\alpha_t) B^*) + (1-\zeta(t)) C^*\]

where: - \(A^*, B^*\) are extended versions of the standard matrices (incorporating DEL/DEL* types) - \(C^*\) describes transitions from normal types to DEL - \(\zeta(t)\) is the logistic CDF, controlling the timing of deletion

Insertion/Deletion Schedule¶

A logistic probability density function controls when insertions/deletions occur:

\[\zeta'(t) = -\frac{e^{\frac{t-D}{w}}}{w(e^{\frac{t-D}{w}} + 1)^2}\]

Parameter \(D\) controls the peak timestep and \(w\) controls the steepness of the curve, ensuring that insertions/deletions are smoothly distributed across the diffusion process.

Target Size Sampling¶

During training, the target size \(n_T\) is sampled from a discrete distribution \(h_{n_0}(n)\):

\[h_{n_0}(n) = p_{max} + \frac{p_{min} - p_{max}}{n_{max}} (n - n_0)\]

The highest probability is assigned to \(n = n_0\) (unchanged size), with probability decreasing linearly as the size difference grows.

Loss & Training¶

GrIDDD employs classifier-free guidance for conditional generation: - Condition information is injected directly into the denoising network during training - A cross-entropy loss trains the denoiser to predict \(G_0\) - Node types and edge types are optimized separately - The reverse process requires handling posterior probabilities for three scenarios: insertion, deletion, and unchanged size

Key Experimental Results¶

Main Results: Property Targeting¶

Results on QM9 and ZINC-250k datasets:

Method	QM9 MAE ↓	QM9 Validity ↑	ZINC MAE ↓	ZINC Validity ↑
DiGress	High	~95%	High	~85%
FreeGress	Medium	~97%	Medium	~90%
GCPN	Medium	~90%	Medium	~80%
JT-VAE	Medium	~100%	Medium	~100%
GrIDDD	Lowest or on par	~96–98%	Lowest or on par	~88–92%

GrIDDD achieves property targeting error (MAE) on par with or better than existing best methods while maintaining high chemical validity. Notably, GrIDDD solves a harder problem (simultaneously adjusting size and structure) yet still yields competitive results.

Molecular Optimization¶

Method	Avg. Improvement ↑	Optimization Success Rate ↑	Type
VJTNN	Medium	Medium	Translation model
Seq2Seq	Low	Low	Translation model
HierG2G	Medium–High	Medium	Translation model
GCPN	Medium	Medium	RL optimization
JT-VAE	Medium	Medium	VAE optimization
GrIDDD	Highest	Highest	Diffusion model

GrIDDD significantly outperforms dedicated optimization models on molecular optimization tasks, precisely because it can optimize target properties by adding or removing atoms rather than only modifying existing atom types.

Ablation Study¶

Effect of Size Adaptation¶

Configuration	QM9 MAE	Molecular Size Variance
Fixed size (empirical distribution sampling)	Baseline	Low (constrained by pre-sampling)
Fixed size (classifier-predicted)	Slightly better	Medium
GrIDDD (dynamic adjustment)	Best	High (adaptive)

Dynamic size adjustment enables the model to generate molecules with more diverse sizes, with the advantage being most pronounced when target properties strongly correlate with molecular size.

Key Findings¶

Size adaptability is critical for conditional generation: GrIDDD's advantage is most evident when target properties (e.g., molecular weight, polarizability) correlate with molecular size.
Harder training does not imply worse results: Although GrIDDD must simultaneously learn denoising and size adjustment, this added complexity does not degrade property targeting performance.
Structural advantage in molecular optimization: Conventional methods cannot optimize properties by changing the number of atoms; GrIDDD removes this limitation.
Necessity of the DEL/DEL* dual-state design: A single deletion state prevents nodes from being recovered during the reverse process; the dual-state design is a key technical contribution.

Highlights & Insights¶

Theoretical contribution: This work is the first to formally incorporate node-level insertion and deletion operations within the graph diffusion framework, deriving new forward/reverse transition matrices.
Practical value: It opens the door to size-adaptive molecular generation with direct relevance to drug design.
Elegant design: The seemingly complex functionality is realized by introducing only two special atom types (DEL, DEL*) and a mixed diffusion process.
Extensibility: The method can be applied to any graph generation framework based on discrete diffusion.

Limitations & Future Work¶

Insertion/deletion is monotonic (the reverse process can only continuously insert or continuously delete), and alternating insertion and deletion within a single generation trajectory is not supported.
The hyperparameters of the logistic schedule (\(D\), \(w\)) require tuning.
Computational overhead: the transition matrix dimensionality increases due to the additional deletion types.
Validation is limited to QM9 and ZINC-250k; larger-scale datasets and more complex molecular structures (e.g., proteins) remain untested.
The determination of threshold \(p^*\) may be task-dependent.

DiGress (Vignac et al., 2023): categorical diffusion + Graph Transformer; the primary baseline of this work.
FreeGress (Ngo et al., 2024): classifier-free conditional generation; the training paradigm from which this work draws inspiration.
Johnson et al. (2021): first proposed insertion/deletion concepts in text diffusion, though with a more complex formulation.
Ketata et al. (2025): the Lift method samples additional nodes during the reverse process but does not incorporate this into training.
GCPN (You et al., 2018): RL-based molecular generation; a classic approach for property targeting.
The innovations presented in this paper are generalizable to domains requiring structure-adaptive generation, such as protein design and materials discovery.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First realization of size-adaptive generation in graph diffusion
Technical Depth: ⭐⭐⭐⭐ — Transition matrix design and posterior derivation are substantive
Practicality: ⭐⭐⭐⭐⭐ — Direct application value for molecular design
Clarity: ⭐⭐⭐⭐ — Method description is clear and figures are intuitive
Overall Score: 8.5/10