Metacognitive Sensitivity for Test-Time Dynamic Model Selection¶

Conference: NeurIPS 2025 (CogInterp Workshop)
arXiv: 2512.10451
Code: To be confirmed
Area: Multimodal VLM / Model Selection / AI Metacognition
Keywords: metacognition, meta-d', dynamic model selection, contextual bandit, signal detection theory

TL;DR¶

Inspired by the concept of metacognitive sensitivity (meta-d') from cognitive science, this paper proposes a test-time dynamic model selection framework that quantifies a model's ability to "know what it doesn't know" via meta-d', combines it with instantaneous confidence scores to form a context vector, and employs a contextual bandit to online-select the optimal model, outperforming individual models across multiple datasets.

Background & Motivation¶

Background: Deep learning has become increasingly specialized—CNNs excel at visual perception, Transformers/LLMs dominate NLP, and VLMs bridge cross-modal tasks. The No Free Lunch theorem dictates that no single architecture is optimal for all problems, motivating the need for dynamic model selection.

Limitations of Prior Work: Probabilistic confidence scores produced by models are often severely miscalibrated—i.e., confidence does not align with actual accuracy—making confidence-based selection unreliable.

Cognitive Science Inspiration: Human metacognition research offers mature mathematical tools for assessing "an agent's ability to evaluate its own knowledge." Among these, meta-d' is a signal detection theory-based metric that measures metacognitive sensitivity while remaining decoupled from task performance and confidence bias.

Core Idea: This work elevates meta-d' from a diagnostic tool to a functional signal, embedding it within a bandit selection framework to enable adaptive model selection at test time.

Method¶

Problem Formulation¶

Given a pair of pretrained models $M = \{M_A, M_B\}$ and an image sequence $D = \{x_1, \ldots, x_N\}$, the goal is to learn a selection policy $\pi$ that, for each input $x_t$, chooses the model most likely to produce a correct prediction: $$\max_{\pi}\sum_{t=1}^{N} R_t = \max_{\pi}\sum_{t=1}^{N} \mathbb{I}(\hat{y}_{a_t,t} = y_t)$$

Framework Core: Dual-Signal Context + Bandit Selection¶

Context vector (4-dimensional): $$s_t = [c_{A,t},\; \mu_{A,t},\; c_{B,t},\; \mu_{B,t}]$$ - Short-term signal $c_{k,t}$: instantaneous confidence (softmax maximum) of model $M_k$ on the current sample $x_t$ - Mid-term trait $\mu_{k,t}$: metacognitive sensitivity (meta-d') of model $M_k$, reflecting the stable trait of its recent ability to predict accuracy from confidence

Meta-d' Computation: - Based on the hierarchical Bayesian framework of Fleming & Daw, computed by fitting confidence distributions over correct and incorrect trials - Advantage: independent of task performance (d') and overall confidence bias, purely measuring metacognitive sensitivity - The authors developed a GPU-parallelized package to accelerate computation

Dynamic Update Mechanism: 1. Burn-in phase: The first $B=100$ trials collect (confidence, reward) data from all models to compute the initial $\mu_{k,0}$ 2. Sliding window update: Every $F=50$ trials, meta-d' is recomputed using the most recent $W=100$ trials 3. This enables the framework to adapt to non-stationary changes in model performance (e.g., distribution shift)

Bandit Algorithms: - LinUCB: $\pi_t(s_t, k) = \hat{\theta}_k^\top s_t + \alpha\sqrt{s_t^\top A_k^{-1} s_t}$, selecting $a_t = \arg\max_k \pi_t(s_t, k)$ - LinTS: samples $\tilde{\theta}_k \sim \mathcal{N}(\hat{\theta}_k, \sigma^2 A_k^{-1})$, selecting $a_t = \arg\max_k \tilde{\theta}_k s_t^\top$ - At each step, the reward $R_t = \mathbb{I}(\hat{y}_{a_t,t} = y_t)$ is observed and used to update $A_k$ and $b_k$

Key Experimental Results¶

CNN Model Pairs on CIFAR-10¶

Model Pair	300 trials	700 trials	1000 trials
AlexNet-ViT	62.4 → 69.5 (+7.1%)	64.8 → 66.2 (+1.4%)	62.4 → 65.9 (+3.5%)
EfficientNet-ViT	67.7 → 75.9 (+8.2%)	66.4 → 68.0 (+1.6%)	66.4 → 67.8 (+1.4%)
AlexNet-GoogleNet	62.7 → 70.6 (+7.9%)	57.7 → 57.5 (-0.2%)	56.8 → 58.4 (+1.6%)
EfficientNet-GoogleNet	54.8 → 59.0 (+4.8%)	53.6 → 55.8 (+2.2%)	54.8 → 57.3 (+2.5%)

VLM Model Pairs on CIFAR-10 + PACS (Domain Shift Setting)¶

Model Pair	1500 trials	2500 trials	4000 trials
MetaCLIP-SigLIP	98.7 → 99.0 (+0.3%)	98.7 → 98.6 (0.0%)	98.4 → 98.5 (+0.1%)
CLIP-ALIGN	94.2 → 96.0 (+1.8%)	94.8 → 96.2 (+1.6%)	94.8 → 95.8 (+1.0%)

Key Findings¶

Gains are most pronounced in early trials (+4.8% ~ +8.2%), stabilizing at +1.4% ~ +3.5% as the bandit converges
Heterogeneous architecture pairs (CNN + Transformer) yield greater benefits than homogeneous pairs, as inductive bias diversity reduces correlated errors
When AlexNet's meta-d' decreases, the bandit automatically shifts toward GoogleNet, demonstrating adaptive capability
Gains from VLM pairs are modest (+0.1% ~ +1.8%), as VLMs are already highly accurate individually

Highlights & Insights¶

⭐⭐⭐⭐ Interdisciplinary Innovation: Operationalizing the cognitive science concept of metacognition (meta-d') as a functional component in ML systems is conceptually novel
⭐⭐⭐ Dual Timescale Modeling: The separation of short-term confidence and mid-term metacognitive sensitivity is insightful
⭐⭐⭐ Adaptability: Sliding window updates enable the framework to handle non-stationary scenarios
⭐⭐⭐ Lightweight and Practical: Requires no additional training, relying solely on existing model outputs

Limitations & Future Work¶

Validation is limited to image classification; extension to more complex tasks such as generation and retrieval has not been explored
The framework is currently restricted to selection between two models; scaling to a larger model pool (>2) poses open challenges in computation and policy design
Meta-d' computation requires a window of 100 trials, which may lack flexibility for small-batch real-time deployment scenarios
Limited gains in the VLM setting suggest diminishing marginal returns when individual models are already sufficiently strong
As a workshop paper, the experimental scale and depth of analysis leave room for further development

Rating¶

⭐⭐⭐ An interesting interdisciplinary workshop paper that introduces the cognitive science concept of metacognition into dynamic model selection with a novel formulation. Meta-d' as a quantitative measure of a model's "self-awareness" carries unique value, yet the limited experimental scale and task diversity mean further validation is needed before practical deployment.

Method Category	Representative Method	Adaptive	Utilizes Metacognition	Computational Cost
Static Ensemble	Majority vote / averaging	✗	✗	High (all models run)
Dynamic Ensemble Selection	Local accuracy	✓	✗	Medium
MoE	Gating network	✓	✗	High (end-to-end training)
Ours	meta-d' + Bandit	✓	✓	Low (no additional training)