Skip to content

💻 Code Intelligence

🧪 ICML2025 · 9 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (59) · 💬 ACL2026 (50) · 🧪 ICML2026 (22) · 🤖 AAAI2026 (10) · 🧠 NeurIPS2025 (19) · 📹 ICCV2025 (1)

🔥 Top topics: Code Intelligence ×3 · Reasoning ×2 · LLM ×2 · Adversarial Robustness ×2

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Proposes AdaptiveStep, a method that automatically divides reasoning steps based on model prediction confidence to train a more precise Process Reward Model (ASPRM). On mathematical reasoning and code generation tasks, it surpasses existing open-source PRMs at less than 70% of the data construction cost, and further enhances reasoning performance through token-level value-guided decoding.

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

EffiCoder constructs an "accurate and efficient" instruction tuning dataset named EffiInstruct. It enables code LLMs to significantly reduce execution time and total memory overhead while improving the pass@1 rate, demonstrating that "efficiency can be learned through data design."

EpiCoder: Encompassing Diversity and Complexity in Code Generation

This paper proposes a code data synthesis framework based on "Feature Trees." By extracting hierarchical semantic features from code and iteratively evolving them, the framework achieves precise control over the complexity and diversity of synthetic data. The resulting trained EpiCoder series of models achieves state-of-the-art (SOTA) performance among similarly sized models on both function-level and file-level code generation benchmarks.

Function-to-Style Guidance of LLMs for Code Translation

F2STrans is proposed to progressively fine-tune LLMs in two stages: functional learning (correctness) and style learning (readability). This allows Qwen-1.5B to outperform prompt-enhanced Qwen-32B and GPT-4 on average across 20 code translation scenarios.

Mind the Gap: A Practical Attack on GGUF Quantization

This work proposes the first attack targeting the GGUF quantization format. It leverages quantization errors as "degrees of freedom" to train a malicious quantized model that behaves normally in full precision but triggers backdoors after quantization. This approach is highly effective in unsafe code generation (\(\Delta=88.7\%\)), targeted content injection (\(\Delta=85.0\%\)), and benign refusal (\(\Delta=30.1\%\)).

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

ORPS (Outcome-Refining Process Supervision) is proposed, which unifies process and outcome rewards in a tree-search framework by combining code execution feedback with LLM self-criticism. It achieves a 26.9% accuracy improvement and a 42.2% efficiency boost in code generation without training a PRM.

Robust Learning of Diverse Code Edits (NextCoder)

This work proposes a synthetic code editing data generation pipeline alongside a robust adaptation algorithm SeleKT (Selective Knowledge Transfer). By performing periodic top-k sparse projections of task vectors during fine-tuning, the model is equipped with strong specialized code editing capabilities while preserving its original code generation and general reasoning capacities. The resulting NextCoder model family outperforms same-sized or even larger models across five code-editing benchmarks.

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

This paper proposes SparseLoRA, which dynamically selects subsets of weights for forward and gradient computation via contextual sparsity. It migrates the inference-time sparsity acceleration paradigm to the LLM fine-tuning stage for the first time, achieving up to 2.2× reduction in FLOPs and 1.6× measured speedup, while maintaining accuracy.

Training Software Engineering Agents and Verifiers with SWE-Gym

This paper proposes SWE-Gym, the first environment designed for training software engineering (SWE) agents, containing 2,438 real-world task instances from 11 open-source Python repositories. By leveraging rejection sampling fine-tuning on SWE-Gym to train SWE agents and verifiers, it achieves resolve rates of \(32.0\%\) on SWE-Bench Verified and \(26.0\%\) on SWE-Bench Lite, setting a new SOTA for open-weight SWE agents.