📂 Others¶
💬 ACL2026 · 5 paper notes
- Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations
-
This paper extends the LiTEx reasoning taxonomy from "label-consistent, explanation-variant" settings to label-disagreement scenarios, finding that annotators may share similar reasoning strategies despite assigning different labels, and that reasoning category agreement better reflects the semantic similarity of explanations than label agreement alone.
- Are Large Language Models Economically Viable for Industry Deployment?
-
This paper proposes Edge-Eval, a framework that evaluates LLMs across their full deployment lifecycle on legacy T4 GPUs using five deployment metrics—economic break-even, intelligence-per-watt, system density, cold-start tax, and quantization fidelity. The framework reveals that sub-2B models comprehensively outperform 7B models on both economic and ecological dimensions, and uncovers the counterintuitive finding that QLoRA, while reducing memory by ~60%, can increase energy consumption by up to 7×.
- Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
-
This paper proposes PTE (Prefill Token Equivalents), a hardware-aware efficiency metric for tool-integrated reasoning (TIR) that unifies the costs of internal reasoning and external tool use. Through large-scale experiments, the paper identifies four inefficiency patterns in TIR: confirmatory tool use, tool mixing, lack of tool priors, and tool format collapse.
- Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation
-
This paper deploys LLM agents (DeepSeek/GPT series) in the classic beer distribution game to simulate multi-stage supply chains, systematically investigating how cognitive heterogeneity (differences in reasoning capability) affects system behavior. The findings demonstrate that LLM agents can reproduce human-observed bullwhip effects and myopic behaviors, and that information sharing effectively mitigates these adverse effects.
- Reliable Evaluation Protocol for Low-Precision Retrieval
-
This paper identifies that low-precision retrieval systems (e.g., binarized or quantized embeddings) suffer from a large number of spurious ties due to reduced score granularity, leading to highly unstable evaluation results. Two complementary strategies are proposed—High-Precision Scoring (HPS) and Tie-aware Retrieval Metrics (TRM)—to enable more reliable and consistent evaluation of low-precision retrieval systems.