CVPR2025 Medical Imaging Non-invasive blood glucose estimation scleral blood vessel imaging multi-view learning Transformer fusion MRFO feature optimization

Deep Learning Based Estimation of Blood Glucose Levels from Multidirectional Scleral Blood Vessel Imaging¶

Conference: CVPR2025
arXiv: 2603.12715
Code: Not open-sourced
Area: Medical Imaging
Keywords: Non-invasive blood glucose estimation, scleral blood vessel imaging, multi-view learning, Transformer fusion, MRFO feature optimization

TL;DR¶

Proposes ScleraGluNet, which utilizes five-direction scleral blood vessel images integrated with a multi-branch CNN, MRFO feature selection, and Transformer cross-view fusion, achieving a three-class metabolic state classification accuracy of 93.8% and continuous fasting plasma glucose estimation with an MAE of 6.42 mg/dL, offering a novel approach for non-invasive blood glucose monitoring.

Background & Motivation¶

The global prevalence of diabetes reached 537 million in 2021 and is projected to reach 783 million by 2045, with chronic hyperglycemia leading to microvascular and macrovascular complications.
Standard diagnostic tests (FPG, OGTT, HbA1c) require invasive blood sampling, posing a heavy burden for daily monitoring; although CGM reduces finger-prick tests, it still requires subcutaneous implantation and remains costly.
Ocular surface microvessels (sclera/conjunctiva) are directly observable, and existing studies have confirmed that diabetes induces morphological changes in conjunctival vessels (e.g., tortuosity, density changes); however, systematic multi-view exploitation based on deep learning is lacking.
Core Motivation: Single-view acquisition loses heterogeneous vascular information across different scleral regions; multidirectional acquisition and cross-view fusion are necessary to comprehensively capture blood glucose-related microvascular features.

Method¶

Overall Architecture¶

ScleraGluNet is a multi-view multi-task deep learning architecture consisting of four core modules:

Image preprocessing and blood vessel enhancement
Five-pathway parallel CNN feature extraction
MRFO feature refinement + Transformer cross-view fusion
Dual classification/regression outputs

Data Collection and Preprocessing¶

Dataset: 445 subjects (150 normal / 140 controlled diabetes / 155 hyperglycemic diabetes), with five gaze directions per subject (primary, superior, inferior, nasal, temporal), totaling 2,225 anterior segment images.
Preprocessing Pipeline: Quality control \(\rightarrow\) ROI extraction (removing eyelid/eyelash background) \(\rightarrow\) Color/brightness normalization \(\rightarrow\) CLAHE contrast enhancement \(\rightarrow\) Frangi filter tubular structure enhancement \(\rightarrow\) Binary mask validation.

Network Design¶

Five-Pathway Parallel CNN Branches: Each gaze direction corresponds to a CNN branch with independent parameters, extracting direction-specific local vascular features (caliber changes, tortuosity, branching complexity).
MRFO Feature Refinement: The Manta Ray Foraging Optimization (MRFO) algorithm selects a feature subset, removing redundant or highly correlated features and retaining the most discriminative vascular representations.
Transformer Cross-View Fusion: A self-attention mechanism models long-range dependencies across different scleral regions, identifying cross-quadrant vascular patterns (e.g., temporal-nasal asymmetric remodeling).
Dual Output Heads: A classification head outputs probabilities for three metabolic states, while a regression head estimates continuous FPG (mg/dL).

Loss & Training¶

Composite loss = Cross-entropy loss (classification) + MSE loss (regression), optimized via multi-task joint learning.

Training Strategy¶

Subject-level five-fold cross-validation (GroupKFold) is utilized, where all images from a single subject appear only in the same fold to prevent data leakage.
95% CIs are estimated via subject-level bootstrap resampling (1000 iterations).
The Adam optimizer is employed, with the learning rate, batch size, epoch number, and task loss weights tuned on the validation set.
Evaluation is performed at the subject level (rather than the image level) to ensure the results reflect true generalization capability.

Key Experimental Results¶

Metric	Value
Three-class overall accuracy	93.8% (five-fold mean 93.7% \(\pm\) 0.7%)
Normal group recall	94.0% (141/150)
Controlled diabetes recall	92.1% (129/140)
Hyperglycemic diabetes recall	93.5% (145/155)
AUC (Normal / Controlled / Hyperglycemic)	0.971 / 0.956 / 0.982
FPG estimation MAE	6.42 mg/dL
FPG estimation RMSE	7.91 mg/dL
Pearson r / \(R^2\)	0.983 / 0.966
Bland-Altman mean bias	+1.45 mg/dL
95% limits of agreement	-8.33 to +11.23 mg/dL

Ablation Study (incremental classification accuracy): - Single-view CNN baseline \(<\) Multi-view CNN (without MRFO/Transformer) \(<\) Multi-view + MRFO \(<\) Full ScleraGluNet, demonstrating significant contributions from each module.

Key Findings: - Misclassifications primarily occur between adjacent metabolic categories (controlled vs. hyperglycemic), aligning with the clinical characteristics of the blood glucose continuum. - Grad-CAM/Grad-CAM++ visualizations reveal that the model focus is concentrated on the scleral vessel regions, with the hyperglycemic group displaying consistent strong activation across different gaze directions. - Five-fold accuracy is highly stable: individual fold accuracies range from 92.8% to 94.6% with a standard deviation of only 0.7%, indicating that the results do not rely on favorable data partitioning. - Representative case analysis: vessels in the normal group are thin and uniform; the controlled group exhibits mild tortuosity; and the hyperglycemic group shows prominent vasodilation, spiral structures, and uneven caliber changes.

Highlights & Insights¶

Innovative Multidirectional Acquisition Design: Systematically utilizes scleral images from five gaze directions for the first time, capturing spatially heterogeneous microvascular information.
Complete Closed Loop: An end-to-end design spanning from the image acquisition protocol to preprocessing, feature extraction, fusion, and dual-task output.
Clinical Feasibility: Requires only an anterior segment camera (no dilation or fundus imaging needed), making it highly suitable for telemedicine and large-scale screening.
Rigorous Validation: Incorporates subject-level splitting, bootstrap CIs, and Bland-Altman analysis, avoiding common data leakage issues.
Dual-Task Joint Learning: Classification and regression tasks share feature representations, mutually enhancing performance.

Limitations & Future Work¶

Single-center study (Changsha Aier Eye Hospital), lacking multi-center external validation, leaving its generalizability to be verified.
Confounding factors that may affect scleral vessels, such as hypertension, smoking, and anemia, were not controlled.
Focuses solely on fasting plasma glucose without incorporating postprandial blood glucose or longitudinal monitoring data.
Grad-CAM only provides coarse localization, which cannot serve as an exact indicator of vascular pathology.
The dataset scale is limited (445 subjects), posing a risk of overfitting for the deep learning models.

Retinal Imaging: Existing deep learning systems predict cardiometabolic states and HbA1c from retinal images, but require expensive fundus cameras.
Conjunctival Microcirculation: Previous studies recorded diabetes-related conjunctival vascular alterations using OCTA and red-free imaging, but did not construct an end-to-end DL system.
PPG/Thermal Imaging: Consumer-grade devices estimate blood glucose but are sensitive to motion/lighting with weak physiological coupling.
MRFO-INEYENET: The authors' previous work utilized only single-angle ocular images with MRFO optimization; ScleraGluNet introduces multidirectional acquisition and Transformer fusion on top of it.
Association Between Scleral/Conjunctival Vessels and Metabolism: Multiple OCTA studies have confirmed microvascular changes in the sclera of patients with diabetes, providing a physiological basis for the study.
Multi-View Learning: Acquiring multi-angle data enhances model robustness and generalizability, which has been validated across various computer vision domains.

Rating¶

Novelty: ⭐⭐⭐⭐ (Non-invasive blood glucose estimation combining multidirectional scleral imaging and cross-view fusion is a novel approach.)
Experimental Thoroughness: ⭐⭐⭐⭐ (Includes five-fold cross-validation, ablation study, Bland-Altman, and Grad-CAM, but lacks external validation.)
Writing Quality: ⭐⭐⭐ (Structurally clear but suffers from descriptive redundancies and incoherent paragraphs in the introduction section.)
Value: ⭐⭐⭐⭐ (Holds promising clinical application prospects, but requires multi-center validation for actual deployment.)