Do Different Prompting Methods Yield a Common Task Representation?¶

Conference: NeurIPS 2025 arXiv: 2505.12075 Code: None Area: Interpretability / LLM Mechanisms Keywords: Task representation, function vectors, prompting methods, attention heads, interpretability

TL;DR¶

By generalizing the Function Vectors (FV) framework from few-shot demonstrations to text instructions, this paper finds that different prompting methods do not induce a unified task representation within LLMs; instead, they activate partially overlapping but largely distinct attention head mechanisms.

Background & Motivation¶

LLMs can perform tasks via two primary paradigms: few-shot demonstrations and text instructions.
For instance, providing examples such as "Q: Japan A: Tokyo, Q: Chile A: Santiago..." or directly stating "Map countries to their capitals" both aim to elicit the same behavior from the model.
Core Problem: Do these two prompting paradigms induce the same internal task representation?
Function Vectors (FV), proposed by Todd et al. (2024), is a causal interpretability method that extracts task representations by identifying a small set of critical attention heads.
The original FV framework is restricted to few-shot demonstration settings; this paper extends it to arbitrary task specification formats, particularly zero-shot text instructions.
Understanding task representation mechanisms is of significant importance for model interpretability, model steering, and prompt engineering practices.

Method¶

Overall Architecture¶

This paper generalizes the FV extraction pipeline of Todd et al. (2024) from few-shot demonstrations to arbitrary task presentation formats (with a focus on text instructions), then systematically compares the two types of FVs in terms of: 1. Task execution performance 2. Internal activation similarity 3. The attention head mechanisms they rely upon

Key Designs¶

Generalized Function Vector Extraction:
Original method: Given a K-shot demonstration prompt, compute the task-conditioned mean activation \(\bar{a}_{lj}^t\) for each attention head, then estimate the Causal Indirect Effect (CIE) using baseline prompts with shuffled labels; select the top-20 causally relevant attention heads \(\mathcal{A}^D\) and sum their mean activations to form the function vector \(v_t = \sum_{a_{lj} \in \mathcal{A}^D} \bar{a}_{lj}^t\).
Proposed extension: Replace few-shot demonstrations with a task specification \(Q_t\) (e.g., a text instruction) to construct prompts \(p_i^t = [q_m^t, x_{iq}]\). Llama-3.1-405B is used to generate approximately 200 deduplicated instructions per task, from which the top \(J=5\) instructions by accuracy are selected.
Instruction Function Vector Construction:
Mean task-conditioned activations are computed over 100 samples; CIEs are estimated over 25 samples.
Short instructions (≤16 tokens) and long instructions (unconstrained) are generated separately, crossed with three baseline types, yielding 6 conditions in total.
CIEs are averaged across the 6 conditions to identify the top-20 attention heads.
For final evaluation, activations from short and long instruction conditions are averaged.
Non-Informative Baseline Design: Three methods are used to construct baselines that are equiprobable to the instructions but contain no task-relevant information:
Equiprobable token sequences: Tokens are sampled position-by-position to match the conditional probability of the original instruction at each position.
Natural text: Text segments from WikiText-103 are sampled to match the length and probability of the original instruction.
Other-task instructions: Instructions from other tasks are used as baselines, matched by length and probability.

Loss & Training¶

This paper involves no model training; the core methodology is causal intervention analysis.
Evaluation protocol: FVs are injected as additive interventions into the residual stream at layer \(\lfloor L/3 \rfloor\).
Two evaluation settings:
Shuffled-label 10-shot: Used to evaluate demonstration FVs (matching their extraction context).
Zero-shot: Used to evaluate instruction FVs (matching their extraction context).

Key Experimental Results¶

Main Results¶

Models: Llama-3.2-3B (base/Instruct), Llama-3.1-8B (base/Instruct), OLMo-2-7B series, Llama-2-7B series Tasks: 50 lightweight NLP tasks (antonyms, capital mapping, translation, NER, etc.)

Model	10-shot Baseline	Shuffled 10-shot Baseline	Best Instruction	0-shot Baseline
Llama-3.2-3B	0.753	0.154	0.765	0.153
Llama-3.2-3B-Instruct	0.790	0.186	0.864	0.107
Llama-3.1-8B	0.821	0.199	0.820	0.128
Llama-3.1-8B-Instruct	0.846	0.179	0.887	0.077
OLMo-2-7B	0.729	0.171	0.857	0.169
OLMo-2-7B-Instruct	0.774	0.164	0.870	0.147

Finding 1 — Instruction FVs are effective: Instruction FVs improve zero-shot accuracy from below 20% to above 50% on the best-performing models, though they do not match the accuracy of demonstration FVs under the shuffled 10-shot setting.

Finding 2 — Joint injection outperforms individual FVs: Simultaneously injecting both FV types at layer \(\lfloor L/3 \rfloor\) consistently outperforms using either type alone (with the exception of base Llama-3.1-8B).

Ablation Study¶

Shared Attention Head Analysis (Finding 3):

Model	Demo-Only Heads	Instruction-Only Heads	Shared Heads
Llama-3.2-3B	13	13	7
Llama-3.2-3B-Instruct	13	13	7
Llama-3.1-8B	16	16	4
Llama-3.1-8B-Instruct	16	16	4

CIE Ratio (Demonstration / Instruction, top-20 heads):

Model	Mean CIE Ratio	Median CIE Ratio
Llama-3.2-3B	3.901	1.482
Llama-3.2-3B-Instruct	3.570	1.359
Llama-3.1-8B	4.794	2.894
Llama-3.1-8B-Instruct	2.181	1.337

Key Findings¶

Instruction FVs are more effective in instruction-tuned models: Instruction FVs in Instruct models substantially outperform those in base models.
Layer depth differences in attention heads: Post-training shifts the average layer depth of instruction FV heads from being deeper than demonstration FV heads to approximately the same depth.
Asymmetry (Finding 4): A "heterogeneous" FV constructed using heads localized by instructions but activations from demonstrations outperforms the reverse combination, suggesting that instruction-based task inference leverages attention heads that also play a mild role in demonstration-based ICL.
Cross-model transfer (Finding 5): Instruction FVs from instruction-tuned models can effectively steer the corresponding base models, nearly recovering the FV evaluation accuracy of the instruction-tuned model itself.
Post-training stage analysis: In the OLMo-2 series, both the SFT and DPO stages each produce a significant boost in instruction FV effectiveness, while the final RL stage has negligible impact.

Highlights & Insights¶

Elegant methodological design: The three non-informative baselines (equiprobable sampling, natural text, and other-task instructions) are mutually complementary; averaging across them enables robust identification of causally relevant attention heads.
Neuroscience analogy: The FV extraction process is likened to a functional localizer paradigm, where head selection corresponds to "localization" and activation computation to "recording." The heterogeneous FV experiment is designed to decouple the two mechanisms.
Practical implications: The findings provide an interpretability-grounded theoretical basis for the widely observed empirical benefit of combining instructions with demonstrations—the two prompting paradigms activate distinct mechanisms and thus provide complementary information.
More diffuse instruction representations: CIE distributions for demonstration FVs are more concentrated (a small number of heads contribute disproportionately), whereas those for instruction FVs are flatter (more heads contribute modestly), suggesting that instruction-based steering may benefit more from interventions targeting multiple layers.

Limitations & Future Work¶

Simple task suite: Only 50 lightweight tasks (e.g., antonyms, translation) are used; complex open-ended benchmarks such as MMLU and BBH are not covered.
Fixed intervention depth: The intervention is fixed at layer \(\lfloor L/3 \rfloor\), which may be suboptimal; different tasks may require interventions at different depths.
Limited model scale: Experiments are conducted only on models ranging from 1B to 8B parameters; scaling behavior in larger models remains unexplored.
Single representation extraction method: Only the FV framework is studied; alternative methods such as Task Vectors (Hendel et al., 2023) may yield different conclusions.
Cannot fully falsify the null hypothesis: The possibility of a unified task representation not captured by the proposed method cannot be ruled out.
Future directions include: examining the relationship between instruction FV heads and induction heads, investigating what post-training specifically changes to enable instruction-based task inference, and further exploring cross-model FV transfer.

Function Vectors (Todd et al., 2024): The direct foundation of this work; this paper extends FVs from demonstrations to instructions.
Task Vectors (Hendel et al., 2023): An alternative task representation extraction method using residual representations at delimiter positions.
Activation Steering (Stolfo et al., 2024): The finding that post-training FVs can steer base models is consistent with cross-model steering results reported therein.
Wu et al. (2024): Investigates what instruction tuning changes, proposing that the key modification may lie in how instruction tokens are processed.
Implications: Monitoring the activity of attention heads associated with demonstrations and instructions respectively could serve as an indicator of whether a model has successfully formed a task representation, thereby guiding prompt optimization.

Rating¶

Dimension	Score (1–10)	Notes
Novelty	8	Extending FVs from demonstrations to instructions and conducting a systematic comparison; the research question is posed from a novel angle
Technical Depth	7	Causal intervention analysis is rigorous and the three-baseline design is principled, though no new model or algorithm is proposed
Experimental Thoroughness	9	12 models × 50 tasks, multiple control experiments and ablations, with an exceptionally detailed appendix
Writing Quality	8	Structure is clear; five findings are presented in a coherent progression with precise exposition
Practical Value	6	Contribution is primarily at the level of mechanistic understanding; direct application is limited
Overall	7.6	A rigorous empirical interpretability study that illuminates the relationship between prompting paradigms and internal representations