Skip to content

Efficient Estimation of Kernel Surrogate Models for Task Attribution

Conference: ICLR 2026 arXiv: 2602.03783 Code: https://github.com/VirtuosoResearch/Kernel-surrogate-models Area: Reinforcement Learning Keywords: task attribution, kernel surrogate model, influence function, data attribution, kernel ridge regression

TL;DR

This paper proposes a kernel surrogate model (KernelSM) for task attribution. By employing RBF kernel ridge regression to capture nonlinear interaction effects among tasks, combined with a gradient-projection-based efficient estimation algorithm that eliminates repeated retraining, KernelSM achieves a 25% improvement in correlation over linear surrogate and influence function baselines across mathematical reasoning, in-context learning, and multi-objective RL settings.

Background & Motivation

Background: Modern AI systems (e.g., LLMs) are trained simultaneously on diverse tasks. Quantifying the contribution of each training task to a target task (task attribution) is a central problem in interpretability. Existing approaches include leave-one-out (LOO) retraining (exact but computationally infeasible), influence functions (requiring Hessian computation), and linear surrogate models (fitting a linear function via random subset sampling).

Limitations of Prior Work: Linear surrogate models can only capture first-order additive effects and fail to express nonlinear interactions among tasks—such as synergistic effects (joint training on two tasks outperforms the sum of individual contributions), adversarial effects, and XOR-type effects. These interactions are particularly pronounced for training samples near decision boundaries.

Key Challenge: Stronger surrogate models (e.g., kernel methods) require evaluating model performance \(F(\mathbf{s})\) on more subsets, each demanding a full retraining—making the computational cost comparable to LOO.

Goal: (1) Provide a unified analysis of the relationship between linear surrogates and influence functions; (2) Design surrogate models capable of capturing nonlinear task interactions; (3) Enable efficient estimation without repeated retraining.

Key Insight: The paper first uses a second-order Taylor expansion to prove that linear surrogates \(\approx\) influence functions (when second-order interactions are small), thereby exposing the limitations of linear surrogates. It then upgrades the surrogate model with an RBF kernel and employs a first-order gradient approximation to avoid repeated retraining.

Core Idea: Efficiently construct kernel surrogate models via gradient-projection-based first-order approximation, capturing nonlinear task interactions with less than 2% relative error.

Method

Overall Architecture

Given \(K\) training tasks → sample \(m\) binary subset vectors \(\mathbf{s} \in \{0,1\}^K\) → efficiently estimate model performance \(F(\mathbf{s}^{(i)})\) for each subset (via gradient approximation, without retraining) → fit \(g_\theta: \{0,1\}^K \to \mathbb{R}\) using kernel ridge regression → output task attribution scores (nonlinear task interactions are implicitly encoded by the kernel function).

Key Designs

  1. Unified Analysis of Linear Surrogates and Influence Functions:

  2. Function: Prove that linear surrogate model coefficients are equivalent to influence functions under a first-order approximation.

  3. Mechanism: Apply a second-order Taylor expansion to \(F(\mathbf{s})\), substitute into the linear regression objective, and analyze the regression coefficients via the delta method. Proposition 3.1 establishes \(\|\hat{\beta} - \nabla_\mathbf{s} F(\mathbf{s}^*) - \text{second-order correction}\| \lesssim c_3 K^{3/2} p^{-1}\).
  4. Design Motivation: Reveals the fundamental limitation of linear surrogates—when the norm of the Hessian \(\mathbf{H}_\mathbf{s}\) is non-negligible, neither linear surrogates nor influence functions can accurately attribute task contributions.

  5. RBF Kernel Surrogate Model (KernelSM):

  6. Function: Replace linear regression with kernel ridge regression to learn nonlinear task interactions.

  7. Mechanism: \(g_\theta(\mathbf{s}) = \sum_i \theta_i k(\mathbf{s}^{(i)}, \mathbf{s})\), where \(k(\mathbf{s}^{(a)}, \mathbf{s}^{(b)}) = \exp(-\gamma \|\mathbf{s}^{(a)} - \mathbf{s}^{(b)}\|^2)\). The closed-form solution for the coefficients is \(\theta = (\mathcal{K} + \lambda I)^{-1} \mathbf{F}\).
  8. Design Motivation: The RBF kernel is a universal approximator on the binary space \(\{0,1\}^K\), and its geometric intuition naturally matches the subset space—similar subsets (close in Hamming distance) should yield similar performance.

  9. Efficient Estimation via Gradient Projection (Core Contribution):

  10. Function: Avoid retraining the model for each subset \(\mathbf{s}^{(i)}\) by using a first-order Taylor approximation to estimate \(F(\mathbf{s}^{(i)})\).

  11. Mechanism: Perform a first-order expansion at the pretrained weights \(W_0\): \(f_W(x) \approx f_{W_0}(x) + \langle \nabla f_{W_0}(x), W - W_0 \rangle\). For each subset \(\mathbf{s}^{(i)}\), solve for the optimal perturbation \(Z^*_{\mathbf{s}^{(i)}}\) via multinomial logistic regression using projected gradients as features, then estimate \(\hat{f}(x) = f_{W_0}(x) + \langle \nabla f_{W_0}(x), Z^*_{\mathbf{s}^{(i)}} \rangle\).
  12. Design Motivation: Gradients need only be computed once at \(W_0\); all subsequent subset estimations reduce to linear algebra on CPU. Empirical validation shows that the first-order approximation error is less than 2%.
  13. Key Technique: Gaussian random convolution is used to project high-dimensional gradient vectors into a low-dimensional space, reducing regression solving to a matter of seconds.

Loss & Training

Kernel ridge regression objective: \(\min_{g_\theta} \sum_{i=1}^m (F(\mathbf{s}^{(i)}) - g_\theta(\mathbf{s}^{(i)}))^2 + \lambda \|g_\theta\|^2_\mathcal{K}\)

Hyperparameters \(\lambda\) and \(\gamma\) are selected via cross-validation.

Key Experimental Results

Main Results

Method CIFAR-10 Corr.↑ Modular Arith. Corr.↑ ICL Corr.↑ Multi-obj. RL Corr.↑
Influence Function ~0.74 ~0.55 ~0.72 ~0.65
Linear Surrogate ~0.76 ~0.58 ~0.75 ~0.68
KernelSM ~0.82 ~0.80 ~0.88 ~0.73

KernelSM achieves a 25% improvement in correlation over linear baselines and influence functions (compared against LOO ground truth). The largest gain is observed on modular arithmetic tasks (42%), attributable to strong nonlinear interactions in operations such as XOR and division.

Ablation Study

Kernel CIFAR-10 Residual↓ Modular Arith. Residual↓
Linear 4.4±0.9 4.6±1.3
RBF 1.0±0.0 1.5±0.4
Task First-order Approx. Relative Error
CIFAR-10 1.02±0.69%
Modular Arithmetic 2.40±2.17%
ICL 0.51±0.04%
Multi-objective RL 0.43±0.73%

Key Findings

  • Nonlinear interactions are pervasive: The residual error of the RBF kernel is 1/4 to 1/3 that of the linear model, indicating that nonlinear task interactions are non-negligible.
  • First-order gradient approximation is sufficiently accurate: Relative error remains below 2% across diverse tasks and model scales (including a 34B-parameter LLM).
  • Downstream task selection benefits substantially: Using KernelSM for ICL example selection and task selection in multi-objective optimization reduces loss by 40% compared to linear methods.
  • Linear surrogates and influence functions are empirically confirmed to be equivalent under first-order approximation (Pearson correlation 0.96–0.98).

Highlights & Insights

  • Unified theory: This is the first work to rigorously establish, via second-order Taylor expansion, the conditions under which linear surrogate models and influence functions are equivalent, while revealing their shared limitation—neither can capture task interactions.
  • Efficient estimation via gradient projection: The approach elegantly circumvents the computational bottleneck of kernel methods through first-order Taylor approximation combined with random projection for dimensionality reduction, making the practical cost of KernelSM comparable to that of linear models.
  • Geometric intuition of the RBF kernel: On the binary subset space \(\{0,1\}^K\), the RBF kernel is equivalent to a heat kernel based on Hamming distance—implying that similar training subsets (differing in only a few tasks) should yield similar performance, which constitutes a sound inductive bias.

Limitations & Future Work

  • The first-order approximation is valid near the pretrained weights \(W_0\); when fine-tuning deviates substantially (e.g., full fine-tuning of large models), the approximation error may increase significantly.
  • The expressive capacity of the kernel surrogate model is bounded by the number of sampled subsets \(m\)—for large \(K\), a sufficiently large number of subset samples is required.
  • Stronger kernel functions such as deep kernels or neural tangent kernels remain unexplored.
  • Validation is limited to task-level attribution; extension to instance-level data attribution (though the theoretical framework is general) has not been pursued.
  • Evaluation relies primarily on correlation with LOO ground truth; the practical utility of kernel attribution for model debugging or data cleaning has not been validated.
  • vs. DataModels (Ilyas et al. 2022): DataModels employ linear regression for data attribution; KernelSM generalizes this to kernel methods.
  • vs. Influence Functions (Koh & Liang 2017): This paper proves that influence functions \(\approx\) the first-order approximation of linear surrogates; only KernelSM can capture second-order effects.
  • vs. TRAK (Park et al. 2023): TRAK also uses gradient features for data attribution but remains a linear model; KernelSM employs an RBF kernel to capture nonlinear interactions.

Rating

  • Novelty: ⭐⭐⭐⭐ The idea of applying kernel methods to task attribution is natural and theoretically grounded; the efficient estimation algorithm is the key contribution.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers classification, mathematical reasoning, ICL, and RL; ablations are comprehensive.
  • Writing Quality: ⭐⭐⭐⭐ Theory and experiments are well-organized, though some proof details are overly compressed.
  • Value: ⭐⭐⭐⭐ Provides a more powerful tool for task attribution with significant gains in downstream applications (task selection).