Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging¶

Conference: ECCV 2024
arXiv: 2311.16914
Code: https://github.com/peirong26/Brain-ID
Area: Medical Imaging / Brain Imaging
Keywords: Anatomical Representation Learning, Contrast-agnostic, Brain MRI, Synthetic Data Training, Multi-task Adaptation

TL;DR¶

This paper proposes Brain-ID, a contrast-agnostic brain anatomical representation learning model. Through a "mild-to-severe" intra-subject image synthesis strategy, it is trained on fully synthetic data to obtain anatomical features robust to MRI contrast, resolution, orientation, and artifacts. With only a single-layer adaptation, it achieves SOTA performance on four downstream tasks and six public datasets.

Background & Motivation¶

Background: Deep learning has made significant progress in calibrated medical imaging (e.g., CT), but its generalization ability in uncalibrated modalities—especially MRI—is severely lacking. MRI imaging parameters are highly variable: different contrasts (T1, T2, FLAIR, etc.), resolutions, acquisition orientations, and various artifacts make it very difficult for a model trained on one protocol to generalize to another.

Limitations of Prior Work: (1) Existing methods are highly sensitive to MRI contrast; for instance, a segmentation model trained on T1-weighted images shows a sharp drop in performance on T2. (2) Training models separately for each MRI protocol is extremely costly, as there are hundreds of different acquisition protocols in clinical practice. (3) Existing methods suffer severe performance degradation in low-resolution and small-dataset scenarios. (4) There is a lack of a universal brain anatomical representation that can abstract away imaging appearance variations and capture the essence of anatomy.

Key Challenge: The greatest strength of MRI (its flexible contrast mechanism) is also the biggest obstacle for AI applications—the same anatomical structure presents vastly different image appearances under different protocols. There is a need for a representation that can learn the "anatomical identity" while ignoring the "imaging appearance."

Key Insight: The authors propose to learn a contrast-agnostic anatomical representation. The key idea is that the same anatomical location of the same subject should share the same "anatomical identity," regardless of the MRI protocol used. By synthesizing a large number of images with different appearances for the same subject, the model is trained to learn features that are consistent across appearances.

Core Idea: Through "mild-to-severe" intra-subject synthetic data augmentation, the model is trained to learn representations that are invariant to imaging appearance (contrast, resolution, artifacts) but sensitive to anatomical structures.

Method¶

Overall Architecture¶

Brain-ID consists of two main components: (1) Representation learning phase—training the encoder on a large amount of synthetic data to make its output features consistent across different appearance transformations of the same subject (intra-subject consistency) while remaining distinguishable across different subjects (inter-subject robustness). (2) Downstream adaptation phase—freezing the pre-trained encoder and training only a single linear layer to adapt to various downstream tasks, including contrast-agnostic tasks (anatomical reconstruction, contrast synthesis, brain segmentation) and contrast-dependent tasks (super-resolution, bias field estimation).

Key Designs¶

"Mild-to-Severe" Intra-subject Generation:
- Function: Generate a large number of training samples for the same subject under different imaging conditions.
- Mechanism: Starting from annotated brain MRI data, a series of progressively increasing appearance transformations from mild to severe are applied to the same subject: random contrast transformations (simulating different MRI sequences), random resolution downsampling (simulating low-resolution acquisition), random spatial deformation (simulating registration errors and anatomical variations), and random artifact addition (simulating motion artifacts, etc.). By controlling the severity of the transformations, the model gradually learns to maintain feature consistency under increasingly larger appearance differences.
- Design Motivation: Facing the maximum appearance differences directly can lead to training instability. Starting with mild transformations and gradually increasing the difficulty conforms to the curriculum learning ideology, allowing the model to learn generalization capabilities stably. Training on fully synthetic data means that acquiring actual multi-contrast data is not required.
Intra-subject Consistency and Inter-subject Distinctiveness Learning:
- Function: Ensure that the learned features encode anatomical identity rather than imaging appearance.
- Mechanism: Contrastive learning framework—features from different appearance images of the same subject (positive pairs) should be close, while features from different subjects (negative pairs) should be far apart. The training objective ensures that voxel-level anatomical features remain consistent under different transformations of the same subject, and specific metrics are designed to quantify this intra-subject robustness and inter-subject robustness.
- Design Motivation: Traditional image similarity metrics (such as cross-correlation) rely on imaging appearance, which is unsuitable for cross-contrast scenarios. The learned features should directly encode anatomical semantics rather than pixel intensities.
Single-layer Downstream Adaptation:
- Function: Rapidly adapt to multiple downstream tasks with minimal extra parameters.
- Mechanism: Freeze all parameters of the Brain-ID encoder and train only a linear layer on top of its output features. The adaptation approach for the four categories of downstream tasks is unified—whether for segmentation (outputting discrete labels), synthesis (outputting continuous images), super-resolution, or bias field estimation, only a single linear head needs to be trained.
- Design Motivation: Single-layer adaptation validates that the quality of the learned representations is high enough, meaning the features themselves already contain sufficient anatomical information. This also implies that overfitting will not occur even on small datasets.

Loss & Training¶

Training employs a voxel-level contrastive learning loss, combining an intra-subject consistency loss and an inter-subject distinctiveness loss. All training data are generated entirely through synthesis. Downstream tasks use their respective task-related losses (cross-entropy for segmentation, \(L_1\) regression for synthesis, etc.) but only optimize the linear head parameters.

Key Experimental Results¶

Main Results¶

Evaluating 4 downstream tasks on 6 public datasets:

Task	Dataset	Metric	Brain-ID	Prev. SOTA	Gain
Brain Segmentation (Cross-contrast)	Multi-modal MRI	Dice Coefficient	Best	SynthSeg, etc.	Significant Gain
Anatomical Reconstruction / Contrast Synthesis	Multi-modal MRI	SSIM/PSNR	Best	Traditional Methods	First-time Implementation
Super-resolution	Low-resolution MRI	PSNR	Best	Supervised Methods	Robustness Maintained
Bias Field Estimation	Bias field-corrupted MRI	Estimation Error	Best	N4, etc.	More Robust

Brain-ID achieves SOTA performance on all tasks across all MRI modalities and CT.

Ablation Study¶

Configuration	Key Metric	Description
Complete Method	Best	Mild-to-severe synthesis + contrastive learning
Mild-only Synthesis	Drop	Insufficient ability to generalize to large appearance differences
Severe-only Synthesis	Drop	Unstable training, poor feature quality
Multi-layer Adaptation	Slightly better	Single layer is sufficient, proving high representation quality
Low-resolution Input	Maintained	Key advantage: Robust to low-resolution data
Small Datasets	Maintained	Key advantage: Single-layer adaptation is less prone to overfitting

Key Findings¶

The "mild-to-severe" strategy performs significantly better than using only mild or only severe synthesis.
Brain-ID features are highly consistent under different MRI protocols of the same subject (intra-subject consistency) while remaining distinguishable across different subjects (inter-subject distinctiveness).
A single-layer adaptation is sufficient to achieve SOTA on all 4 tasks across 6 datasets.
In low-resolution and small-dataset scenarios, the advantage of Brain-ID over other methods is even more pronounced.

Highlights & Insights¶

Fully Synthetic Data Training: Requires no real annotated paired multi-contrast data, greatly reducing data collection costs.
One Representation, Multiple Tasks: The learned anatomical features are highly versatile, covering both contrast-agnostic and contrast-dependent tasks.
Resource-friendly for Low-resource Scenarios: The single-layer adaptation + frozen pre-training strategy remains effective on small datasets, which is highly critical for clinical applications.
New Evaluation Metrics: Proposes new metrics specifically designed to evaluate intra-subject and inter-subject representation robustness.

Limitations & Future Work¶

Currently validated only for brain imaging; generalizing to other organs requires additional work.
The realism of the synthetic data may affect performance in extreme real-world scenarios.
The resolution and receptive field of the 3D convolutional encoder may limit the representation of fine-grained structures.
Alternative self-supervised contrastive learning methods (such as masked autoencoders like MAE) have not been explored.

SynthSeg: Pioneering work in cross-domain brain segmentation based on synthetic data.
SynthSR: Synthetic-data-driven super-resolution method.
Contrastive Learning (SimCLR/MoCo): The representation learning framework of Brain-ID borrows ideas from contrastive learning.
Insights: Synthetic data + contrastive learning is a powerful combination for addressing cross-domain generalization in medical imaging; the "one-time pre-training, multi-task adaptation" paradigm can be generalized to other medical imaging scenarios.

Rating¶

Novelty: ⭐⭐⭐⭐ (The concept of contrast-agnostic anatomical representation is innovative)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (6 datasets, 4 downstream tasks, multiple MRI modalities and CT)
Writing Quality: ⭐⭐⭐⭐
Value: ⭐⭐⭐⭐⭐ (Highly significant for clinical applications of brain imaging)