Skip to content

AD-GBC: Anisotropic Granular-Ball Skip-Connection Refiner for UNet-Based Medical Image Segmentation

Conference: CVPR 2026
Paper: CVF OpenAccess
Code: https://github.com/SiaShen-dot/AD-GBC (Available)
Area: Medical Imaging
Keywords: Medical Image Segmentation, Granular-Ball Computing, Anisotropic Prototypes, Differentiable Clustering, UNet skip-connection

TL;DR

The study upgrades "point prototypes / isotropic balls" used as semantic anchors in UNet to differentiable granular balls with anisotropic vector scales. A bidirectional "Pixel Set ↔ Ball" aggregation-broadcasting mechanism serves as a semantic refiner for skip-connections, supplemented by two geometric regularizations to prevent anchor collapse. This approach yields consistent performance gains (average IoU +1.3~1.7%) across four medical segmentation benchmarks for both Rolling-UNet and U-KAN backbones.

Background & Motivation

Background: UNet has long been the de facto standard for medical image segmentation due to its encoder-decoder structure and skip-connections. Recent improvements focus on the skip-fusion junction—specifically prototype/slot refinement methods like ProtoSeg, Slot Attention, and Gaussian Attention. These methods reinterpret the concatenation of "deep semantic features + shallow high-resolution features" as interactions between pixel features and a set of learnable prototypes (slots), organizing raw features into consistent semantic patterns to improve regional consistency and boundary delineation.

Limitations of Prior Work: While these are powerful refined skip-connectors, they model each semantic concept as a single point in the feature space and rely on unidirectional or iterative attention without explicit control over regional geometry. The authors highlight three mismatches in Fig.1: (1) Multi-modal distribution—a single semantic class (e.g., "lesion") often consists of multiple visual patterns (dark nuclei, inflammatory edges), which point prototypes fail to capture (especially when \(K=M\)). (2) Anisotropic geometry—classic Granular-Ball Computing (GBC) addresses multi-modality using \(K \gg M\) balls but assumes each pattern is isotropic (scalar radius = circle), whereas real feature clusters are "elliptical." (3) Semantic ambiguity + Class imbalance—fitting isotropic balls to anisotropic data leads to a catastrophic trade-off: to cover the long axis, the ball must "expand," causing expanded balls of sparse lesion classes to suffer large false overlaps with dense background clusters, mismodeling subtle boundary ambiguities.

Key Challenge: The fundamental mismatch between anisotropic real data and isotropic model geometry; additionally, classic GBC faces engineering hurdles such as non-differentiable clustering and manual radius definition, preventing integration into end-to-end gradient-based networks.

Goal: (a) Evolve prototypes from "points" to bounded, learnable regions capable of describing real shapes; (b) Make this regional representation fully differentiable for end-to-end training; (c) Prevent degradation and collapse of learnable anchors.

Key Insight: Prototypes are relaxed from "point anchoring" to "soft region anchoring," where each concept is represented by a hyper-ellipsoid defined by a center \(c_k\) and a vector scale \(\sigma_k \in \mathbb{R}^D\), utilizing a differentiable Set→Ball→Set path for interaction with pixel features.

Core Idea: Replace "point prototypes / isotropic balls" with "anisotropic differentiable granular balls" to perform geometry-aware semantic alignment and purification on UNet skip-connections.

Method

Overall Architecture

The input to AD-GBC is a feature map \(X \in \mathbb{R}^{B \times C_{in} \times H \times W}\) from a deep skip-connection, and the output is a semantically "purified" \(Y\) of the same size. Instead of replacing the bottleneck, it serves as a Semantic Skip-Connection Refiner inserted between deep levels of the encoder and decoder (e.g., Level 3). The workflow involves: projecting and flattening the feature map into a pixel set → assigning "soft membership" to pixels using learnable granular-ball anchors → Set→Ball aggregation for regional consensus → Ball→Set broadcasting back to pixels → residual fusion + lightweight refinement to produce \(Y\). During training, two geometric regularizations constrain anchor distribution and scales. A key engineering detail is weight sharing: the same AD-GBC module operates on both encoder and decoder paths simultaneously, forcing features from both sides to project onto the same semantic manifold before element-wise addition.

%%{init: {'flowchart': {'rankSpacing': 24, 'nodeSpacing': 28, 'padding': 6, 'wrappingWidth': 400, 'subGraphTitleMargin': {'top': 8, 'bottom': 16}}}%%
flowchart TD
    E["Enc/Dec Feature X<br/>Cin×H×W"] --> P["1×1 Conv Proj + Flatten<br/>→ Pixel Set {zᵢ}"]
    subgraph GB["Anisotropic Differentiable Granular-Ball (Set↔Ball)"]
        direction TB
        A["Scaled Dist + Softmax<br/>dᵢₖ=‖(zᵢ−cₖ)⊘σₖ‖² → αᵢₖ"]
        A --> AGG["Set→Ball Aggregation<br/>c'ₖ=Σ αᵢₖ zᵢ"]
        AGG --> BRD["Ball→Set Broadcast<br/>ẑᵢ=Σ αᵢₖ c'ₖ"]
    end
    P --> A
    BRD --> R["Residual Fusion + Refine<br/>Y=f(X+X̂)"]
    R --> F["Weight-Shared Semantic Skip Refiner<br/>Dual-path module → Fusion"]
    GB -.Training Constraints.-> REG["Geometric Regs<br/>Wasserstein Div + Scale Consistency"]

Key Designs

1. Anisotropic Differentiable Granular-Ball (AD-GBC Module): Upgrading Point Prototypes to End-to-End Hyper-ellipsoidal Regions

This is the core contribution addressing the limitation that point prototypes/isotropic balls cannot model the shapes of real feature clusters. Each granular-ball anchor is defined by a center \(c_k \in \mathbb{R}^D\) and an anisotropic vector scale \(\sigma_k \in \mathbb{R}^D\) (constrained to be positive via Softplus). Unlike scalar radii, these dimension-wise vectors allow anchors to form learnable hyper-ellipsoids. Pixel \(z_i\) membership is determined by "scaled distance"—measuring how many "radii" the pixel is from the center:

\[d_{i,k}=\left\|(\mathbf{z}_i-\mathbf{c}_k)\oslash\boldsymbol{\sigma}_k\right\|_2^2\]

where \(\oslash\) denotes element-wise division. This dimension-wise \(\sigma_k\) allows the model to be "tolerant along the long axis and strict along the short axis," geometrically resolving the isotropic dilemma. Distance is normalized via Softmax with temperature \(\tau\) for fuzzy membership weights:

\[\alpha_{i,k}=\frac{\exp(-d_{i,k}/\tau)}{\sum_{j=1}^{K}\exp(-d_{i,j}/\tau)}\]

This is followed by bidirectional interaction: Set→Ball aggregation consolidates pixel information into regional descriptors \(c'_k=\sum_i \alpha_{i,k}\mathbf{z}_i\) (regional consensus for denoising); Ball→Set broadcasting redistributes this regional context back to pixels \(\hat{\mathbf{z}}_i=\sum_k \alpha_{i,k}\mathbf{c}'_k\). After reshaping to \(\hat{X}\), residual fusion and lightweight refinement (3×3 Conv-BN-ReLU) are applied:

\[\mathbf{Y}=f_{\text{refine}}(\mathbf{X}+\hat{\mathbf{X}})\]

The residual ensures training stability and preserves local details. Implementation note: When \(\sigma_k\) degrades to a scalar and the consensus step is removed, AD-GBC degenerates into a standard Gaussian kernel weighting—identifying it as a strict generalization of Gaussian Attention that adds "anisotropy + regional consensus." The complexity \(O(N \cdot (C_{in}D + KD + C_{in}^2))\) is linear relative to pixel count \(N\), making it scalable for high-resolution images.

2. Weight-Shared Semantic Skip-Connection Refiner: Aligning Enc/Dec Features to the Same Semantic Manifold

Standard skip-connections perform naive concatenation/addition of deep semantic and shallow high-resolution features. However, deep features often carry noise or irrelevant context. Ours employs a dual-path + shared weight strategy: encoder features pass through a "Shared AD-GBC" to be purified via \(K\) granular balls; concurrently, upsampled decoder features pass through the same AD-GBC module. The purified features are then added element-wise. This forces both paths to align with the same learnable semantic anchors, eliminating semantic misalignment. This module is selectively placed in deeper, semantically richer skip levels.

3. Geometric Regularization: Preventing Anchor Collapse and Ensuring Scale Consistency

Without guidance, learnable \(\{c_k\}\) and \(\{\sigma_k\}\) may degrade (centers collapsing into low-dimensional subspaces or unstable scale estimates). Two complementary regularizations are used:

Wasserstein Divergence Loss \(\mathcal{L}_{\text{W-div}}\): To prevent "semantic collapse" where multiple centers cluster into low-dimensional subspaces. While squaring pairwise cosine similarities (\(\mathcal{L}_{\text{cos-div}}\)) is common, Theorem 1 proves it cannot detect semantic collapse: if the normalized Gram matrix \(G(\mathbf{C})\) has rank \(r < K\), there exists a low-dimensional embedding \(\tilde{\mathbf{C}} \in \mathbb{R}^{K \times r}\) such that \(G(\tilde{\mathbf{C}}) = G(\mathbf{C})\), resulting in identical cosine loss. Instead, a Wasserstein-inspired loss regularizes the entire distribution:

\[\mathcal{L}_{\text{W-div}}(\mathbf{C})=\underbrace{\|\hat{\mu}_C\|_2^2}_{\text{Centering}}+\underbrace{\operatorname{tr}(\hat{\Sigma}_C)-\frac{2}{\sqrt{D}}\operatorname{tr}(\hat{\Sigma}_C^{1/2})}_{\text{Spectral Uniformity}}\]

where \(\hat{\mu}_C, \hat{\Sigma}_C\) are the empirical mean and covariance of the centers. This encourages "high-rank, spread-out" distributions.

Scale Consistency Loss \(\mathcal{L}_{\text{scale-con}}\): Anisotropic scales \(\sigma_k\) receive weak supervision from task losses. The authors calculate "observed anisotropic variance" \(\mathbf{s}_k^2=\frac{\sum_i \alpha_{i,k}(\mathbf{z}_i-\mathbf{c}_k)^{\circ 2}}{\sum_i \alpha_{i,k}+\epsilon}\) (where \((\cdot)^{\circ 2}\) is element-wise square) and force learnable \(\sigma_k^2\) to approximate it:

\[\mathcal{L}_{\text{scale-con}}=\frac{1}{K}\sum_{k=1}^{K}\left\|\text{detach}(\mathbf{s}_k^2)-\boldsymbol{\sigma}_k^2\right\|_2^2\]

detach is critical to break the circular dependency "\(\sigma_k \to \alpha_{i,k} \to s_k^2\)," creating a stable one-way constraint pulling the learnable scale toward the observed one.

Loss & Training

The composite end-to-end objective is: $\(\mathcal{L}_{\text{total}}=\mathcal{L}_{\text{task}}+\lambda\,\mathcal{L}_{\text{W-div}}+\beta\,\mathcal{L}_{\text{scale-con}}\)$ \(\mathcal{L}_{\text{task}}\) is a BCE + Dice combination. \(\lambda \in \{0.01, 0.1\}\), \(\beta \in \{0.1, 0.05\}\). Granular-ball count \(K=32\). Adam optimizer is used with a base lr of \(1\times10^{-4}\) and a specific GBC lr of \(1\times10^{-2}\). Trained for 400 epochs on an A100.

Key Experimental Results

Main Results

Evaluation spans four benchmarks: BUSI (Breast Ultrasound), GlaS (Histopathology), CVC-ClinicDB (Colonoscopy), and ISIC17 (Dermoscopy). AD-GBC is integrated as a plug-and-play module into Rolling-UNet and U-KAN.

Selected IoU results (%) across datasets:

Backbone + AD-GBC BUSI IoU GlaS IoU CVC IoU ISIC17 IoU
Rolling-UNet-M 66.41 → 67.36 85.86 → 88.16 84.93 → 87.07 83.51 → 84.88
Rolling-UNet-L 67.85 → 69.30 87.99 → 89.37 85.77 → 87.45 84.10 → 84.95
U-KAN 65.76 → 69.86 87.65 → 88.64 84.94 → 85.03 84.52 → 84.59

Efficiency and average IoU (selected):

Method Params(M)↓ GFLOPs↓ Avg IoU↑ Gain
Rolling-UNet-M 7.10 66.25 80.18
+ AD-GBC 7.25 77.27 81.87 +1.69
U-KAN 6.35 14.02 80.72
+ AD-GBC 6.51 16.77 82.03 +1.31

The cost is minimal: ≈2% extra parameters and ≈15–20% GFLOPs increase for significant gains.

Ablation Study

Components dissected on GlaS:

Config IoU↑ F1↑ HD95↓ Note
Baseline (Rolling-UNet-L) 87.99 93.47 0.65 No GBC
+ GBC (Isotropic scalar σ) 88.12 93.67 0.63 Marginal gain
+ GBC (Anisotropic vector σ) 88.28 93.88 0.59 Clear boundary improvement
+ Full Regularization 89.63 94.51 0.51 Synergistic effect

Key Findings

  • Anisotropy is the primary driver but requires regularization: Isotropic GBC shows marginal gains; vector scales improve HD95 from 0.63 to 0.59. Only when "stabilized" by both regularizations does the gain jump to 89.63%.
  • Plug-and-play and generalizable: Stable gains across S/M/L scales of Rolling-UNet and the unique U-KAN architecture.
  • Interpretability: Visualization shows anchors specializing in semantic regions (lesion core vs. healthy skin, texture/boundary/background suppression).

Highlights & Insights

  • Clear "Point → Ball → Ellipsoid" geometric evolution: The motivation for why "circles fail and ellipses succeed" is clearly explained—isotropic balls must over-expand to cover long axes, causing false background overlap.
  • Theorem 1 is a standout: It uses rank deficiency to prove cosine diversity cannot detect semantic collapse, justifying the use of Wasserstein spectral uniformity.
  • Degeneration Relationship: AD-GBC simplifies into Gaussian Attention under certain constraints, clearly positioning it as an "anisotropic + regional consensus" generalization.

Limitations & Future Work

  • CNN-only validation: While theoretically compatible with Transformer-based models (TransUNet, Swin-UNet), AD-GBC has not yet been tested there.
  • Small binary datasets: Evaluated primarily on single-target medical datasets; scalability to large-scale multi-organ scenarios is unknown.
  • Hyperparameter sensitivity: Temperature \(\tau\) and \(K\) require tuning based on anchor density.
  • vs Gaussian Attention: Gaussian Attention uses implicit, unbounded kernels; AD-GBC models bounded anisotropic regions with explicit pixel-region consensus updates.
  • vs Slot Attention / ProtoSeg: Previous methods use point anchors and iterative attention; AD-GBC uses "center + explicit geometry" with single-step differentiable Set↔Ball interactions.
  • vs Classic GBC: Classic GBC is non-differentiable and isotropic; AD-GBC modernizes it into a differentiable, anisotropic deep learning module.

Rating

  • Novelty: ⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐