Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics¶
Conference: NeurIPS 2025 arXiv: 2512.06612 Code: GitHub Area: Medical Imaging / Spatial Transcriptomics Keywords: spatial transcriptomics, gene expression estimation, learning to rank, pathology images, batch effects
TL;DR¶
This paper proposes STRank, a loss function that reformulates gene expression estimation from pathology images as a ranking score estimation task. By modeling the stochastic noise inherent in expression counts via binomial/multinomial distributions, STRank enables models to learn robust relative expression relationships from spatial transcriptomics data subject to batch effects and random fluctuations.
Background & Motivation¶
Spatial transcriptomics technologies (e.g., Visium, Xenium) enable high-resolution gene expression profiling on tissue sections, but sequencing costs remain prohibitive. Directly estimating gene expression from pathology images offers a low-cost alternative; however, two core challenges persist:
Batch Effects: Technical factors such as reagent batches and instrument variation introduce systematic biases (scaling and offset) in expression values across tissue samples. Models trained with MSE loss tend to learn these spurious associations rather than genuine biological signals.
Stochastic Noise: Due to cellular heterogeneity and temporal dynamics, observed gene expression values fluctuate randomly even for visually identical image patches. Low-expression genes exhibit particularly poor signal-to-noise ratios, and noise can alter the relative ordering of samples.
Existing methods predominantly employ pointwise MSE loss to optimize absolute expression values on a per-sample basis, which cannot accommodate batch effects. While pairwise losses such as Ranking Loss partially mitigate batch effects, they do not account for the probabilistic nature of count data and fail to reliably distinguish signal from noise under low-signal conditions.
The central hypothesis of this work is: even when absolute expression values are confounded by batch effects and noise, the relative expression trends of genes across image patches remain consistent across independent experiments. For instance, cancer-specific gene expression is consistently higher in tumor regions than in non-tumor regions.
Method¶
Overall Architecture¶
The conventional task of "predicting absolute expression values" is reformulated as "predicting ranking scores." The model \(f: x^{n,i} \to r^{n,i}\) predicts a scale-invariant ranking score \(r\) from image patches, reflecting relative expression relationships within the same tissue. The feature extractor is frozen (using the CONCH pathology foundation model); only the prediction head is trained, isolating the effect of the loss function.
Key Designs¶
- Pairwise STRank Loss: Given a pair of image patches \((x^i, x^j)\) from the same tissue, the expression count \(e^i_g\) is assumed to follow a binomial distribution \(\text{Binomial}(t_g^{i,j}, p_g^i)\), where \(t_g^{i,j} = e_g^i + e_g^j\) is the total expression of gene \(g\) and \(p_g^i\) is the frequency parameter for spot \(i\). The model outputs scores \(\hat{r}^i, \hat{r}^j\), which are converted to probability estimates \(\hat{p}_g^i\) via softmax, and training minimizes the negative log-likelihood:
where \(\hat{p}_g^i = \frac{\exp(\hat{r}_g^i)}{\exp(\hat{r}_g^i) + \exp(\hat{r}_g^j)}\). A key advantage is that when \(\hat{r}^i \gg \hat{r}^j\), this loss reduces to the conventional Ranking Loss; when ordering is uncertain, it adaptively weights by count magnitude — penalizing misranking of highly expressed genes more strongly while tolerating greater uncertainty for lowly expressed ones.
- Listwise STRank Loss: The pairwise formulation is extended to lists, assuming expression counts across \(N^k\) spots follow a multinomial distribution \(\text{Multinomial}(T_g^{(n)}, p_g^i)\). Probabilities across all spots are computed via softmax:
The listwise formulation captures global expression patterns, outperforming pairwise comparisons.
- Library Size Correction: The total expression count per spot \(l^i = \sum_g e_g^i\) is introduced as a correction factor, adjusting probability estimates as \(p_g^i = \frac{\exp(\hat{r}_g^i) l^i}{\sum_j \exp(\hat{r}_g^j) l^j}\), accounting for differences in detection capacity across spots while preserving the discrete structure of count data.
Loss & Training¶
- Sample pairs are constructed via within-group random permutation; each reference sample is randomly paired with another sample from the same tissue
- Mini-batch losses aggregate relative signals from different tissues
- AdamW optimizer with learning rate \(5 \times 10^{-5}\) and batch size 256
- Early stopping with patience = 30 epochs
Key Experimental Results¶
Synthetic Data Experiments¶
| Loss Type | Method | Uniform SCC ↑ | Imbalanced SCC ↑ |
|---|---|---|---|
| Pointwise | MSE | 0.748 | 0.583 |
| Pointwise | Poisson | 0.777 | 0.603 |
| Pointwise | Negative Binomial | 0.788 | 0.601 |
| Pairwise | Rank | 0.835 | 0.738 |
| Pairwise | PairSTRank | 0.907 | 0.818 |
| Listwise | PCC | 0.858 | 0.560 |
| Listwise | ListSTRank | 0.945 | 0.828 |
Real Dataset Experiments (HEST-1k Benchmark, SCC ↑)¶
| Loss | IDC | PRAD | PAAD | COAD | READ | ccRCC | IDC-L | Mean |
|---|---|---|---|---|---|---|---|---|
| MSE | 0.393 | 0.484 | 0.307 | 0.556 | 0.140 | 0.093 | 0.168 | 0.306 |
| Rank | 0.317 | 0.317 | 0.181 | 0.566 | 0.047 | 0.059 | 0.110 | 0.228 |
| PCC | 0.472 | 0.459 | 0.307 | 0.640 | 0.105 | 0.102 | 0.198 | 0.326 |
| PairSTRank | 0.494 | 0.458 | 0.346 | 0.613 | 0.136 | 0.127 | 0.228 | 0.343 |
| ListSTRank | 0.510 | 0.459 | 0.343 | 0.597 | 0.140 | 0.125 | 0.238 | 0.345 |
Ablation Study¶
| Configuration | Key Metric | Notes |
|---|---|---|
| PairSTRank vs. Rank | SCC +0.072/+0.080 | Modeling count distributions outperforms simple ranking |
| ListSTRank vs. PCC | SCC +0.087/+0.268 | Multinomial modeling yields substantial gains under imbalanced conditions |
| Listwise vs. Pairwise | ListSTRank > PairSTRank | Global context benefits scenarios with stronger batch effects |
| With/without library size correction | Marginal improvement | Addresses inter-spot detection capacity differences |
Key Findings¶
- Relative expression learning (ranking-based methods) consistently outperforms absolute expression learning (pointwise methods), especially under batch effects
- The probabilistic modeling in STRank provides significant advantages under low-signal conditions — stochastic noise in count data is particularly impactful when gene expression is sparse
- ListSTRank performs best on synthetic data (capturing global patterns) but performs comparably to PairSTRank on real data
- Real-data evaluation is itself noisy, yet STRank achieves the best average performance across benchmarks
Highlights & Insights¶
- Elegant problem reformulation: Recasting expression estimation as ranking score estimation directly circumvents the root cause of batch effect problems
- Principled probabilistic modeling: Modeling count data with binomial/multinomial distributions is a statistically natural choice, enabling the loss to adaptively weight samples by expression magnitude
- Unification with classical Ranking Loss: The paper proves that STRank reduces to conventional ranking loss when score differences are sufficiently large, establishing an elegant theoretical connection
- Compatibility with downstream analyses: Relative expression relationships are precisely the information required by common downstream tasks such as differential expression analysis
Limitations & Future Work¶
- Improvements on real data are modest (average SCC increases from 0.326 to 0.345)
- Evaluation is restricted to 50 highly variable genes; practical applications may involve a far larger gene set
- The feature extractor (CONCH) is frozen; end-to-end training remains unexplored
- Cross-platform generalization (e.g., Visium → Xenium) has not been validated
- The evaluation metric itself is subject to noise, making it difficult to determine the true performance ceiling
Related Work & Insights¶
- Learning to Rank: A classical information retrieval paradigm introduced here into the spatial transcriptomics domain
- HEST-1k: A benchmark dataset for spatial transcriptomics
- CONCH: A pathology vision-language foundation model providing feature representations
- Insight: Loss function design is an underappreciated research direction; principled probabilistic modeling can yield substantial gains without modifying model architecture
Rating¶
- Novelty: ⭐⭐⭐⭐ The combination of learning-to-rank with probabilistic modeling of count data is a novel loss design
- Experimental Thoroughness: ⭐⭐⭐⭐ Synthetic data validate hypotheses; 7 real datasets are evaluated, though gains are moderate
- Writing Quality: ⭐⭐⭐⭐ Mathematical derivations are clear and motivation is well-articulated
- Value: ⭐⭐⭐⭐ The loss function approach is broadly applicable and readily extensible to other count data scenarios