Skip to content

📖 NLP Understanding

🧠 NeurIPS2025 · 2 paper notes

📌 Same area in other venues: 💬 ACL2026 (14) · 🤖 AAAI2026 (2) · 📹 ICCV2025 (1)

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

This paper proposes PNLC, a method that trains a lightweight goal-conditioned value function as a "natural language critic" to guide LLM agents in multi-turn planning and self-refinement at the thought-step level. Without direct fine-tuning or inference-time search, PNLC significantly outperforms existing methods on complex interactive tasks such as web navigation, social reasoning, and persuasion, while achieving 8–10× faster inference.

Weak-to-Strong Generalization under Distribution Shifts

This paper demonstrates that naive weak-to-strong generalization fails under distribution shifts—where the strong model performs even worse than the weak supervisor—and proposes RAVEN, a framework that dynamically learns optimal combination weights over multiple weak models to achieve robust weak-to-strong generalization, surpassing baselines by over 30% on OOD tasks.