Skip to content

CORDS: Continuous Representations of Discrete Structures

Conference: ICLR 2026 arXiv: 2601.21583 Code: To be confirmed Area: Object Detection / Molecule Generation Keywords: Set prediction, continuous field representation, bijective mapping, variable-cardinality inference, density field

TL;DR

CORDS is a framework that bijectively maps variable-size discrete sets (detection boxes, molecular atoms) to continuous density and feature fields, enabling models to learn in field space and decode back to discrete sets exactly — without the constraints of fixed slots or padding.

Background & Motivation

Background: Many tasks require predicting object sets of unknown size — the number of detection boxes, molecular atoms, or astrophysical source events is not known a priori.

Limitations of Prior Work: (a) DETR requires pre-allocated fixed slots and fails when the object count exceeds them; (b) padding wastes capacity and introduces spurious signals; (c) continuous methods (VoxMol, CenterNet) can only infer cardinality indirectly, recovering features via auxiliary classifiers.

Key Challenge: How can object count, position, and attributes be jointly modeled without specifying set size in advance?

Goal: Establish a bijective mapping between discrete sets and continuous fields, where cardinality is obtained by integrating the density field, positions are recovered from density peaks, and attributes are projected from the feature field.

Key Insight: Kernel superposition is naturally invertible — each kernel contributes a fixed integral \(\alpha\), so the total integral equals cardinality \(N\); kernel centers correspond to positions; aligning the feature field with the density field enables exact attribute recovery.

Core Idea: Encode discrete objects into a density field plus a feature field using Gaussian kernels, establishing a bijective mapping. The model learns in continuous field space while guaranteeing exact decoding back to discrete sets.

Method

Overall Architecture

The input is a variable-size set \(S = \{(\mathbf{r}_i, \mathbf{x}_i)\}_{i=1}^N\) (positions + features). CORDS encodes it into a density field \(\rho(\mathbf{r})\) and a feature field \(\mathbf{h}(\mathbf{r})\). The model operates in field space and decodes predicted fields in three steps: integrate to obtain cardinality → fit kernel centers to obtain positions → project via Gram matrix to obtain features.

Key Designs

  1. Encoding: Discrete Set → Continuous Fields

  2. Function: Map \(N\) objects to density and feature fields.

  3. Mechanism: \(\rho(\mathbf{r}) = \frac{1}{\alpha} \sum_{i=1}^N K(\mathbf{r}; \mathbf{r}_i)\), \(\mathbf{h}(\mathbf{r}) = \frac{1}{\alpha} \sum_{i=1}^N \mathbf{x}_i K(\mathbf{r}; \mathbf{r}_i)\), using a Gaussian kernel with \(\alpha = \int K \,d\mathbf{r}\).
  4. Design Motivation: Each kernel contributes a fixed integral \(\alpha\), so cardinality can be read directly from the total mass of the density field; the feature field shares support with the density field, ensuring position–attribute alignment.

  5. Decoding: Continuous Fields → Discrete Set

  6. Function: Exactly recover the object set from predicted fields.

  7. Mechanism: (1) Cardinality \(N = \int \rho \,d\mathbf{r}\); (2) Positions: \(\min_{\mathbf{r}_1,\ldots,\mathbf{r}_N} \int \!\left(\rho - \frac{1}{\alpha}\sum_i K(\mathbf{r};\mathbf{r}_i)\right)^2 d\mathbf{r}\); (3) Features: \(\mathbf{X} = \alpha G^{-1} B\), where \(G\) is the Gram matrix.
  8. Design Motivation: Each decoding step has a theoretical guarantee. When kernel centers are sufficiently separated, \(G\) is positive definite and the system has a unique solution, making the full encoding–decoding pipeline bijective.

  9. Sampling Strategy

  10. Function: Discretize the continuous field for neural network processing.

  11. Mechanism: 3D molecules use importance sampling (concentrating samples near signals according to density); images and time series use uniform grid sampling.
  12. Design Motivation: Uniform grids are inefficient in 3D space; importance sampling avoids bounding-box constraints.

Loss & Training

  • Object detection: \(\mathcal{L} = \mathcal{L}_{\text{MSE}} + \lambda(\hat{N} - N)^2\), where the MSE term constrains field reconstruction and the counting term constrains density integration.
  • Molecule generation: A diffusion model generates in field space; decoding is applied only at evaluation time.
  • Astrophysical SBI: Flow matching learns the conditional posterior.

Key Experimental Results

Main Results — Object Detection (MultiMNIST, In-dist vs. OOD)

Model AP (In) AP (OOD) Drop% AP50 (In) AP50 (OOD) Drop%
DETR 81.2 65.4 19.5% 84.0 71.7 14.6%
YOLO 71.9 54.3 24.5% 78.8 64.2 18.5%
CORDS 76.8 64.2 16.4% 81.5 71.8 11.9%

Molecule Generation (QM9, evaluated with OpenBabel)

Model Atom% Mol% Valid% Unique%
VoxMol 99.2 89.3 98.7 92.1
FuncMol 99.0 89.2 100.0 92.8
CORDS 99.2 93.8 98.7 97.1

Key Findings

  • OOD cardinality generalization is CORDS's primary advantage: DETR AP drops 19.5% while CORDS drops only 16.4%.
  • Conditional molecule generation generalizes to property ranges unseen during training.
  • In astrophysical SBI, the cardinality posterior \(p(N|\ell)\) emerges naturally from the field distribution.

Highlights & Insights

  • Theoretical elegance of the bijective mapping: The encoding–decoding is an exact bijection without relying on auxiliary classifiers or peak detection, offering a more unified framework than heatmap-based methods such as CenterNet.
  • Domain agnosticism: The same encoding applies to 2D images, 3D molecules, and 1D time series.
  • Cardinality as a continuously differentiable quantity: \(N = \int \rho \,d\mathbf{r}\) allows cardinality to be optimized via gradient descent.

Limitations & Future Work

  • Detection is validated only on MultiMNIST; evaluation on realistic datasets such as COCO remains absent.
  • Overlapping kernels from nearby objects in the density field degrade separation accuracy.
  • Molecular tasks require dense sampling (~\(10^3\) points per molecule), incurring high computational cost at scale.
  • Kernel center fitting during decoding relies on L-BFGS, introducing additional latency.
  • vs. DETR: DETR uses fixed query slots and is cardinality-limited; CORDS handles variable cardinality naturally via density integration.
  • vs. CenterNet: CenterNet localizes via heatmaps but does not encode attributes; CORDS unifies localization and attributes through the feature field.
  • vs. VoxMol/FuncMol: Cardinality and features are recovered heuristically in these methods; CORDS provides an exact bijection.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — The bijective mapping from discrete sets to continuous fields constitutes an entirely new unified framework.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Covers detection, molecule generation, and astrophysical SBI, though detection experiments are limited to synthetic data.
  • Writing Quality: ⭐⭐⭐⭐⭐ — Theoretical derivations are rigorous with complete proofs of the bijection property.
  • Value: ⭐⭐⭐⭐ — The unified framework is conceptually elegant but requires validation on real-world benchmarks.