CVPR2026 Medical Imaging HR-pQCT osteoporosis classification SegFormer semantic segmentation radiomics machine learning

Transformer-Based Multi-Region Segmentation and Radiomic Analysis of HR-pQCT Imaging for Osteoporosis Classification¶

Conference: CVPR2026
arXiv: 2603.09137
Code: Not available
Area: Medical Imaging
Keywords: HR-pQCT, osteoporosis classification, SegFormer, semantic segmentation, radiomics, machine learning

TL;DR¶

This paper proposes a fully automatic multi-region HR-pQCT segmentation framework based on SegFormer, combined with radiomic features and machine learning for binary osteoporosis classification. The key finding is that soft tissue (tendon/fat) features demonstrate greater diagnostic value than traditional bone features.

Background & Motivation¶

Limitations of osteoporosis diagnosis: The clinical gold standard DXA measures only areal bone mineral density (aBMD), which cannot assess bone microarchitecture, three-dimensional morphology, or surrounding soft tissue quality, leading to a high false-negative rate.
Advantages of HR-pQCT: HR-pQCT provides three-dimensional peripheral bone microstructure imaging at 60.7 µm isotropic voxel resolution with extremely low radiation (<5 µSv); however, existing analysis pipelines focus solely on mineralized bone regions, leaving large amounts of acquired data unexploited.
Association between soft tissue and osteoporosis: Sarcopenia and osteoporosis are highly comorbid conditions, and muscle mass indices (e.g., psoas muscle index) correlate significantly with bone mineral density; yet existing studies largely neglect the diagnostic contribution of soft tissue.
Need for automated segmentation: The current gold standard for HR-pQCT segmentation relies on semi-automatic methods requiring manual correction, which is time-consuming and subject to inter-operator variability. The only fully automatic method (U-Net) segments only two regions—cortical and trabecular bone.
Limitations of CNNs in modeling long-range dependencies: CNNs such as U-Net struggle to model global spatial relationships, resulting in insufficient segmentation accuracy for small targets (e.g., the fibula) in HR-pQCT images.
Multi-region potential of radiomics: Existing studies have demonstrated the effectiveness of radiomics for osteoporosis detection, but all focus on a single anatomical region or tissue type; the comparative diagnostic value of different regions (cortical bone, trabecular bone, soft tissue) remains unclear.

Method¶

Overall Architecture¶

The complete end-to-end pipeline consists of three stages: (1) five-class semantic segmentation via SegFormer → (2) post-processing and soft tissue subdivision into seven anatomical regions → (3) radiomic feature extraction from each region → machine learning classifier for binary osteoporosis classification.

Segmentation Network Design¶

Architecture: SegFormer-B3, with transfer learning from ImageNet/Cityscapes pre-trained weights.
Input adaptation: The RGB three-channel weights are averaged to initialize the patch embedding layer for single-channel grayscale input.
Encoder: A four-stage hierarchical Transformer encoder producing feature maps of sizes 200×200×64, 100×100×128, 50×50×320, and 25×25×512; early stages capture high-resolution local details while later stages capture global semantics.
Decoder: A lightweight MLP decoder that upsamples all four feature maps to 200×200×768, concatenates them, and produces a final output of 200×200×6 (five classes + background).
Preprocessing: Images are cropped to 1600×1600, HU values are clipped to [−4000, 6000] and normalized to [0, 1], then bicubically downsampled to 800×800.
Five annotation classes: tibial cortical bone, tibial trabecular bone, fibular cortical bone, fibular trabecular bone, and soft tissue.

Loss & Training¶

\[L_{Total} = L_{CE} + L_{Dice}\]

An equal-weight combination of cross-entropy loss (pixel-wise classification accuracy) and Dice loss (overlap-based segmentation performance) is employed to balance pixel-level precision with overall region coverage.

Post-Processing and Soft Tissue Subdivision¶

Morphological constraints: The largest connected component is retained for each class; cortical continuity is restored using convex hull detection combined with morphological closing operations.
Soft tissue subdivision: The outer 2 mm boundary is designated as skin; seed-growing is applied based on HU thresholds (tendons: 100–600 HU; fat: −600 to −200 HU); unassigned pixels are classified using a −50 HU threshold.

Radiomic Feature Extraction and Selection¶

939 features are extracted per region: 7 feature families (first-order statistics, 2D shape, GLCM, GLSZM, GLRLM, NGTDM, GLDM) × 9 filter types (original + LoG + Wavelet + square + square root + logarithm + exponential + gradient + LBP).
Three-stage dimensionality reduction: variance thresholding (0.02) → correlation filtering (Pearson |r| > 0.9) → LASSO regression, retaining 3–14 features per region.

Key Experimental Results¶

Dataset¶

Dataset	Purpose	Scale	Source
Segmentation set	SegFormer training/evaluation	6,720 images / 40 scans / 22 subjects	CUIMC + ICMH (dual-center)
Classification set	Osteoporosis prediction	20,496 images / 122 scans / 122 subjects (61 osteoporosis + 61 controls)	ICMH (single-center)

Segmentation Performance (Test Set, Mean ± SD)¶

Model	Soft Tissue IoU	Tibial Cortical IoU	Fibular Trabecular IoU	Mean F1
U-Net	98.0±7.5	86.1±6.4	74.4±23.4	—
Attention U-Net	98.2±6.2	87.0±4.6	72.1±24.7	—
SegFormer	99.2±0.2	86.5±3.6	89.6±5.6	95.36%

SegFormer achieves a +20.43% IoU improvement on fibular trabecular bone (a small target), with the lowest overall variance (IoU SD 3.74% vs. 11.5% for U-Net).

Image-Level Osteoporosis Classification (Logistic Regression, Test Set)¶

Anatomical Region	Accuracy	F1	AUROC
Tibial cortical	76.69%	0.734	0.777
Tibial trabecular	78.55%	0.759	0.799
Fibular cortical	78.99%	0.762	0.847
Tendon tissue	80.08%	0.787	0.850
Adipose tissue	77.73%	0.760	0.857

Key finding: Soft tissue features (tendon/fat) consistently outperform bone regions across all classification metrics.

Patient-Level Classification (Logistic Regression, Test Set: 24 subjects)¶

Model	Accuracy	Sensitivity	AUROC
Non-radiomics (clinical + DXA + HR-pQCT)	0.792	0.667	0.792
Radiomics – tibial	0.792	0.833	0.826
Radiomics – soft tissue	0.875	0.917	0.875

Ablation Study: Effect of Soft Tissue Distance¶

At varying radii centered on the tibial outer surface, the 10 mm region achieves the best XGBoost AUROC of 0.875, indicating that soft tissue proximal to bone carries stronger osteoporosis-associated signals.

Highlights & Insights¶

First application of Transformers to multi-region HR-pQCT segmentation: SegFormer simultaneously segments four bone classes and soft tissue in a fully automatic end-to-end pipeline, achieving a 20% IoU improvement on small targets (fibula).
Counter-intuitive finding that soft tissue outperforms bone: Tendon/fat radiomic features surpass traditional bone features for osteoporosis classification, improving AUROC from 0.792 to 0.875.
Complete seven-class segmentation: Deep learning produces five classes, which are further subdivided by post-processing into skin/tendon/fat, representing the most fine-grained fully automatic segmentation scheme for HR-pQCT to date.
Systematic multi-region comparison: The first systematic comparison of radiomic diagnostic value across cortical bone, trabecular bone, tendon, and adipose tissue regions.
First annotated HR-pQCT segmentation dataset: 6,720 images with five-class pixel-level annotations, promised to be released upon publication.

Limitations & Future Work¶

Small patient-level sample size: Only 122 subjects (24 in the test set), limiting statistical power and generalizability.
Single-center classification data: The classification set originates from ICMH only; cross-center generalizability has not been validated.
2D slice-based processing: HR-pQCT is inherently a 3D volumetric modality; slice-by-slice 2D processing discards inter-slice continuity information.
Limited clinical accessibility of HR-pQCT: The technology is far less available than DXA, constraining near-term clinical translation.
Binary classification only: The framework does not distinguish between osteopenia, osteoporosis, and normal bone density in a multi-class setting.
No 3D segmentation comparison: No comparison is made against volumetric segmentation methods such as 3D U-Net or nnU-Net.

Method	Imaging Modality	Segmentation	Analysis Region	Advantage of This Work
Neeteson et al. (U-Net)	HR-pQCT	Automatic / 2 classes	Cortical + trabecular	Extended to 5+2=7 classes including soft tissue
Wang et al. (Radiomics)	Dual-energy CT	Manual ROI	Vertebral body	Multi-region automatic segmentation + systematic comparison
Huang et al.	Abdominal CT	Manual ROI	Psoas muscle	Fully automatic segmentation + multiple soft tissue types
Kim et al. (Deep Radiomics)	Hip X-ray	—	Femur	HR-pQCT high resolution + multi-region

Rating¶

Novelty: ⭐⭐⭐⭐ — First application of Transformers to multi-region HR-pQCT segmentation; the finding that soft tissue outperforms bone offers clinically inspiring insights.
Experimental Thoroughness: ⭐⭐⭐ — Segmentation evaluation is thorough, but the classification dataset is small (122 subjects / 24 test subjects) and multi-center validation is absent.
Writing Quality: ⭐⭐⭐⭐ — Well-structured with detailed methodological descriptions and rich figures and tables.
Value: ⭐⭐⭐⭐ — Challenges the "osteoporosis diagnosis relies solely on bone" paradigm and highlights the potential diagnostic role of soft tissue in metabolic bone disease.