Skip to content

💻 Code Intelligence

🧪 ICML2026 · 2 paper notes

📌 Same area in other venues: 💬 ACL2026 (31) · 📷 CVPR2026 (2) · 🔬 ICLR2026 (20) · 🤖 AAAI2026 (9) · 🧠 NeurIPS2025 (21) · 📹 ICCV2025 (1)

BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

BoostAPR introduces a three-stage pipeline for "training program-repair models with RL": execution-verified SFT → training with both sequence-level and line-level rewards → during PPO, redistributing sequence rewards to key edit lines using the line-level model. On Qwen2.5-Coder-32B, it boosts SWE-bench Verified from 17.8% to 40.7% (+22.9pp), and achieves 24.8% on Defects4J via cross-lingual transfer.

HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench

On SWE-bench, traditional PPL is affected by the "long context tax" and cannot predict post-SFT agent capabilities. This paper proposes the "entropy compression hypothesis" and the HE-SNR metric, which computes the signal-to-noise ratio only at "high-entropy decision points" where Top-10 entropy exceeds \((\ln 3 + \ln 4)/2\). This achieves a Pearson correlation of 0.96 and Kendall consistency of 0.98 with downstream SWE-bench scores.