RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion¶
Conference: ICLR 2026 arXiv: 2602.16548 Code: — Area: Biomolecular Design / Diffusion Models / Reinforcement Learning Keywords: RNA inverse design, 3D structural similarity, diffusion model, RL fine-tuning, DDPO
TL;DR¶
This paper proposes RIDER, the first framework to incorporate reinforcement learning into 3D RNA inverse design. It first pretrains a conditional diffusion model (RIDE) to learn sequence–structure relationships, then applies RL fine-tuning to directly optimize 3D structural similarity rather than native sequence recovery rate, achieving over 100% improvement across all 3D self-consistency metrics.
Background & Motivation¶
RNA inverse design—given a target 3D structure, finding nucleotide sequences that fold into that structure—is a critical problem in therapeutic drug development and synthetic biology.
Root Cause of existing methods: Nearly all SOTA methods (gRNAde, RiboDiffusion, RDesign, etc.) optimize native sequence recovery rate (NSR) as a proxy objective. However, RNA exhibits high degeneracy—multiple distinct sequences can fold into similar structures, and similar sequences do not necessarily yield similar structures. Consequently:
- NSR shows no clear correlation with structural similarity (at NSR ≈ 50%, GDT_TS can range from 0 to 0.9).
- Over-optimizing NSR limits exploration of non-native sequences.
Method¶
Overall Architecture¶
RIDER = RIDE (pretrained diffusion model) + RL fine-tuning
Stage 1: Conditional Diffusion Model RIDE¶
Structural representation: The RNA 3D backbone is represented as a geometric graph, where nodes correspond to nucleotides and edges encode spatial proximity. A GVP-GNN encoder processes this graph to produce equivariant node embeddings \(\mathbf{h}_c\).
Diffusion model: Learns the conditional distribution \(p(\mathbf{x}_0 | \mathbf{h}_c)\), where \(\mathbf{x}_0 \in \{0,1\}^{N \times 4}\) is the one-hot encoded sequence.
Forward process: \(\mathbf{x}_t = \alpha_t \mathbf{x}_0 + \sigma_t \varepsilon\)
Training objective:
The noise prediction network consists of 5 GVP-GNN layers; inference uses a DDIM sampler (50 steps).
Stage 2: RL Fine-Tuning¶
The denoising sampling process is formulated as an MDP: - State \(s_t = (\mathbf{x}_t, t, \mathbf{h}_c)\) - Action \(a_t\): transition from \(\mathbf{x}_t\) to \(\mathbf{x}_{t-\Delta t}\) - Policy \(\pi_\theta(a_t|s_t)\): parameterized by the diffusion model - Reward: received only at the end of each trajectory
Advantage estimation improvements: 1. Batch-mean baseline: \(b = \mathbb{E}_\tau[R_{\text{traj}}]\) 2. Exponential moving average baseline for training stability: \(b^{(i)} = \beta_{\text{baseline}} \cdot b^{(i-1)} + (1-\beta_{\text{baseline}}) \cdot \bar{R}^{(i)}_{\text{batch}}\)
Policy gradient objective (with PPO clipping):
Reward Functions¶
Four reward functions are designed based on three 3D structural similarity metrics: - \(R^{\text{gdt}} = (\text{GDT\_TS} \times w)^2\) - \(R^{\text{tm}} = (\text{TM-score} \times w)^2\) - \(R^{\text{rmsd}} = -(\text{RMSD} \times w)^2\) - \(R^{\text{gdt\_rmsd}}\): combined reward (best overall performance)
An additional bonus reward \(R_{\text{bonus}}\) is applied when GDT_TS > 0.5 or RMSD < 2.0Å.
Key Experimental Results¶
Main Results (Pretraining)¶
| Method | NSR ↑ |
|---|---|
| gRNAde | 50% |
| RiboDiffusion | 52% |
| RIDE (Ours) | 61% |
Main Results (RL Fine-Tuning)¶
| Method | GDT_TS ↑ | RMSD ↓ | TM-score ↑ |
|---|---|---|---|
| gRNAde | 0.28 (27%) | 10.89 (3%) | 0.30 (28%) |
| RIDE (pretrained) | 0.33 (31%) | 10.36 (8%) | 0.33 (36%) |
| RIDER (\(R^{\text{tm}}\)) | 0.62 (72%) | 4.31 (31%) | 0.61 (72%) |
| RIDER (\(R^{\text{gdt\_rmsd}}\)) | 0.62 (72%) | 3.35 (33%) | 0.56 (68%) |
Percentages indicate the proportion of designs exceeding the designated threshold. RIDER achieves 100%+ improvement across all metrics.
Cross-Predictor Validation¶
Replacing RhoFold with AlphaFold3 as the folding oracle to assess generalizability: RIDER achieves GDT_TS = 0.57, a 119% improvement over gRNAde (0.26), demonstrating that the framework captures generalizable RNA design principles.
Key Findings¶
- NSR shows no clear correlation with 3D structural similarity.
- After RL fine-tuning, NSR typically decreases while GDT_TS improves, indicating that the model discovers novel sequences that fold correctly but differ from native sequences.
- GDT_TS and TM-score are highly correlated (Pearson 0.885) but each captures distinct aspects.
- The combined reward \(R^{\text{gdt\_rmsd}}\) yields the most balanced performance.
Highlights & Insights¶
- First RL framework for 3D RNA inverse design, directly optimizing structural similarity.
- Demonstrates the inadequacy of NSR as a proxy objective from both empirical and theoretical perspectives.
- RL fine-tuning strategy (exponential moving average baseline + PPO clipping) is stable and effective.
- A lightweight model (only 10.2M parameters) achieves substantial gains.
Limitations & Future Work¶
- Relies on structure prediction models such as RhoFold as folding oracles; prediction errors propagate into the reward signal.
- RL training requires extensive sampling (60 trajectories per epoch × 80 epochs).
- Training and evaluation are conducted on only 12,011 RNA structures, limiting data scale.
- No experimental validation has been performed (wet-lab verification of designed sequences).
Related Work & Insights¶
- RNA inverse design: gRNAde, RiboDiffusion, RDesign, and others based on supervised learning.
- RNA structure prediction: RhoFold, AlphaFold3, and related tools.
- RL fine-tuning of generative models: DDPO, RLHF, Constitutional AI, and related approaches.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First RL-driven 3D RNA inverse design framework.
- Motivation: ⭐⭐⭐⭐⭐ — Clear and compelling analysis of NSR's deficiencies.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Multiple reward functions and cross-oracle validation.
- Value: ⭐⭐⭐⭐ — Significant implications for RNA-based drug design.