Adaptive Querying with AI Persona Priors¶
Conference: ICML 2026
arXiv: 2605.00696
Code: https://github.com/yw3453/adaptive-query-ai-persona-priors (available)
Area: Bayesian Experimental Design / Adaptive Querying / LLM Applications
Keywords: AI Persona, Bayesian Adaptive Querying, Digital Twin, Adaptive Testing, Cold Start
TL;DR¶
The authors encapsulate the "distribution of LLM responses under persona conditions" as a finite mixture Bayesian prior, enabling efficient prediction of other responses for a user after only a few questions by performing closed-form posterior updates over persona, outperforming classic CAT/IRT baselines.
Background & Motivation¶
Background: Adaptive querying is a core tool in scenarios such as computerized adaptive testing (CAT), surveys, and recommendation cold start. Mainstream approaches either follow Item Response Theory (IRT/CAT), parameterizing item-user relationships with low-dimensional latent traits, or use neural Bayesian Experimental Design (BED), employing amortized inference or variational approximations in more flexible models.
Limitations of Prior Work: IRT/CAT's trait dimension is too low and requires large-scale historical calibration data for each item; new items require recalibration. Neural BED is flexible but needs to train surrogate/policy networks and still requires nested Monte Carlo integration at deployment, which is slow in real time. Both approaches struggle in cold start scenarios for users or items.
Key Challenge: The trade-off between expressiveness (capturing high-dimensional heterogeneous response patterns) and computational tractability (real-time closed-form posterior updates). Expressiveness requires complex models, while tractability demands low-dimensional parameterizations.
Goal: Construct a prior that simultaneously achieves (1) high expressiveness (capturing real user response diversity), (2) closed-form posterior updates, and (3) no need for extensive calibration data per item.
Key Insight: LLMs, when injected with persona profiles, can simulate the response distribution of specific groups. By precomputing the response distribution for each persona × item offline using a persona dictionary, "which persona a user belongs to" can be treated as a discrete latent variable \(\theta \in \{1,\dots,n\}\), reducing the generative model to a finite mixture distribution.
Core Idea: Use LLM-generated persona response distributions as components of a finite mixture prior, transforming Bayesian adaptive querying into closed-form posterior updates and one-step lookahead entropy minimization over discrete latent variables.
Method¶
Overall Architecture¶
The method consists of offline and online phases. Offline: using a persona dictionary (here, \(n=2058\) real US respondent profiles from Twin-2K-500), for each persona \(\xi_\theta\) and item \(x\), GPT-5-mini is prompted to obtain a \(K\)-class response distribution \(\mu_{\theta,x} \in \Delta^{K-1}\), all cached as a lookup table. Online: for a new user, initialize the persona prior \(p(\theta)\) (estimated via EM on training users), select an item at each step based on history \(h_t\), observe response \(Y_{x_{t+1}}\), update the persona posterior in closed form, and use the mixture distribution to predict the response distribution for target items \(I^\star\); after budget is exhausted, make final predictions and compute log loss / Brier / ordinal MSE.
Key Designs¶
-
Persona-induced Latent Variable Model:
- Function: Replaces the "continuous low-dimensional ability trait" in traditional IRT with "discrete persona membership," with LLM providing \(p(Y_x \mid \theta)\).
- Mechanism: Under the conditional independence assumption \(p(\theta, Y)=p(\theta)\prod_i p(Y_i \mid \theta)\), since \(\theta\) is discrete and item likelihood is categorical, the posterior \(p(\theta \mid Y_{I_t}) \propto p(\theta)\prod_{i \in I_t}\mu_{\theta,i,Y_i}\) is fully closed-form; the predictive distribution \(p(Y_x=k \mid Y_{I_t})=\sum_\theta \mu_{\theta,x,k}\,p(\theta\mid Y_{I_t})\) is also a finite sum.
- Design Motivation: Completely avoids nested Monte Carlo and variational approximations, achieving both "flexible prior" and "real-time inference" in a single model; each persona retains interpretable semantic labels, facilitating downstream user clustering.
-
Greedy One-step Lookahead Adaptive Querying:
- Function: At each step, selects from the feasible item set \(\mathcal{I}_{\text{feas}} \setminus I_t\) the item that most reduces posterior uncertainty over the targets.
- Mechanism: Uses the sum of marginal entropies over target items as uncertainty \(U(P_t)=\sum_{x' \in I^\star} H(Y_{x'} \mid h_t)\); for each candidate \(x\), computes \(\Delta_U(x \mid h_t) = \sum_k p(Y_x=k\mid Y_{I_t})\sum_{x'} H(Y_{x'}\mid h_t, Y_x=k)\), selecting the minimum. The persona model ensures \(p(Y_x \mid Y_{I_t})\) and \(H(Y_{x'} \mid \ldots)\) are finite sums over persona, making the greedy procedure efficient.
- Design Motivation: Classic BED requires high-dimensional integration for predictive distributions, making one-step lookahead infeasible for large item pools; the persona model removes this bottleneck, making greedy algorithms practical at scale.
-
Empirical Bayes Prior Learning + Scoring Rule Evaluation:
- Function: Fits the persona prior \(p(\theta)\) via EM on real data, mitigating model misspecification from mismatch between synthetic persona and real populations.
- Mechanism: Maximizes the marginal likelihood of training users \(\sum_j \log \sum_\theta p(\theta)\,p(Y^{(j)}\mid\theta)\); E-step computes responsibility \(\gamma_{j,\theta}\propto p(\theta)p(Y^{(j)}\mid\theta)\), M-step updates \(p(\theta)\) as the average responsibility. Prediction uses proper scoring rules (log loss for Shannon entropy, Brier for Gini), ensuring mathematical alignment between training objectives and evaluation metrics.
- Design Motivation: The synthetic persona dictionary is inevitably misspecified for real populations; EM concentrates mass on the personas best matching training users, effectively "softly selecting a useful persona subset," improving robustness.
Loss & Training¶
No gradient-based training. Training occurs in two places: (1) offline LLM prompting to extract \(\mu_{\theta,x}\); (2) EM estimation of the prior on real users. All online querying is based on closed-form Bayesian updates and greedy search. CAT baselines (GRM/GPCM and multidimensional variants) are trained via EM for item parameters, then use grid-based posterior inference.
Key Experimental Results¶
Main Results¶
WorldValuesBench (91 items, 88,459 users, 4-point Likert) + 100,000 synthetic users; 5 items as prediction targets, remaining 86 as query pool, budget \(T \in \{5, 10, 20, 40, 86\}\).
| Setting | Method | \(T=5\) Log loss | \(T=20\) Log loss | Notes |
|---|---|---|---|---|
| Synthetic users (well-specified) | Greedy (persona) | Best | Best | Significantly lower than CAT; curve close to Full oracle |
| Synthetic users | Non-adaptive Bayesian Design | Second best | Second best | Adaptive advantage clear on synthetic data |
| Synthetic users | CAT/IRT series | Clearly worse | Still worse | Structural model misspecification |
| Real WVB | Greedy (persona, EM prior) | Best | Comparable to non-adaptive | Adaptive superior at low budget |
| Real WVB | Non-adaptive (persona) | Second | Can surpass greedy | More robust at high budget, less affected by misspecification |
| Real WVB | CAT (GRM/GPCM/M-) | Worse | Worse | Even with 70k training users |
Ablation Study¶
| Configuration | Phenomenon | Interpretation |
|---|---|---|
| Greedy + EM prior | Best on real data | EM prior effectively mitigates mismatch between persona dictionary and real population |
| Greedy + Uniform prior | Noticeably worse than EM | Performance degrades without training data, but still matches CAT |
| Random / Random Fixed (persona model) | Moderate | Validates independent contributions of "querying strategy" and "persona model" |
| Full (all 86 items queried) | Near upper bound but not absolutely optimal | With misspecification, more observations do not always yield better predictions |
Key Findings¶
- On well-specified synthetic data, the persona model structurally outperforms CAT: CAT's low-dimensional trait assumption mismatches the data-generating process.
- On real WVB, greedy is most effective at low budget (\(T \le 10\)), but as budget increases, non-adaptive design can surpass greedy—this is a typical case of overconfident greedy inference being misled by early errors under model misspecification.
- The EM-fitted persona prior concentrates mass on a small subset of personas, effectively "automatically selecting a subset" from the original 2058 personas, which is crucial for real population inference.
- CAT loses to the persona method even with the advantage of 70k training users; when items lack calibration data, CAT is unusable, while the persona method only requires an additional LLM prompt to incorporate new items.
Highlights & Insights¶
- Elevates "LLM as simulator" to "LLM as generator of Bayesian model components," a compelling perspective shift: heuristic persona simulation becomes probabilistic inference with proper posterior.
- Discrete latent variables + categorical likelihood yield closed-form posteriors as finite sums, circumventing the nested MC challenge long faced by the BED community—a case where "structural choice equals computational power" is underrated.
- "No need to retrain item parameters when expanding the item pool" is a system-level advantage—this is a natural strength of LLM-prior, transferable to recommendation cold start, diagnostics, psychometric scale generation, etc.
- The phenomenon of "greedy being surpassed by non-adaptive under misspecification" offers a practical engineering reminder: adaptive is not a panacea; model mismatch can turn greedy into a noise amplifier.
Limitations & Future Work¶
- The quality of persona response distributions from LLM directly determines prior quality; for domains unfamiliar to LLMs (low-resource languages, specialized questionnaires), offline distributions may be poor.
- Currently supports only categorical items; continuous/ordinal items require extending the likelihood form.
- One-step lookahead greedy degrades with long budgets; the paper mentions possible RL multi-step planning, but this is left for future work.
- Fixed persona dictionary is a potential bottleneck: as user populations shift, the dictionary must be updated or expanded; otherwise, even EM cannot remedy misspecification.
Related Work & Insights¶
- vs Classic CAT/IRT: CAT uses continuous low-dimensional traits + item parameters; this work uses discrete persona + LLM-provided likelihoods. The former requires extensive calibration data per item, the latter only a single prompt.
- vs Neural BED (Foster et al. 2021, Ivanova et al. 2021): Neural BED learns amortized surrogate/policy networks but loses exact posterior; this work retains exact posterior.
- vs Collaborative Filtering: CF uses similarity/matrix factorization for existing ratings; this work has explicit generative model, closed-form Bayesian updates, active item selection, and does not require historical ratings from the target population.
- vs Persona-based Simulation (Argyle/Aher/Horton): They use LLM persona as heuristic simulation tools; this work embeds persona outputs into a Bayesian model, providing statistical guarantees for Bayesian inference.
Rating¶
- Novelty: ⭐⭐⭐⭐ The perspective of using persona discrete latent variables to make BED inference closed-form is clear and practical, though individual components have prior art.
- Experimental Thoroughness: ⭐⭐⭐⭐ Both synthetic and real experiments, multiple baselines, and scoring rules are covered, though item types are limited to 4-category Likert.
- Writing Quality: ⭐⭐⭐⭐ Problem motivation and mathematical derivations are clean; the correspondence between Bayesian and scoring rules is clearly explained.
- Value: ⭐⭐⭐⭐ Direct practical value for recommendation cold start, surveys, psychometrics, especially in scenarios with frequent item pool updates.
Related Papers¶
- [ICLR 2026] Modal Logical Neural Networks for Financial AI
- [ICLR 2026] A Cortically Inspired Architecture for Modular Perceptual AI
- [NeurIPS 2025] AdaptGrad: Adaptive Sampling to Reduce Noise
- [AAAI 2026] Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval
- [ICCV 2025] ArgoTweak: Towards Self-Updating HD Maps through Structured Priors