Machine Learning from Explanations¶

Conference: ICML2025
arXiv: 2507.04788
Code: TBD
Area: Robotics
Keywords: Explanation-guided learning, Small-data learning, Spurious correlation, Attention alignment, Sample efficiency

TL;DR¶

Proposes a method to guide machine learning using simple explanation signals (important input features). By employing a two-stage training loop that alternately optimizes prediction accuracy and attention alignment, this method significantly improves performance and stability in scenarios with small data, class imbalance, and spurious features.

Background & Motivation¶

Fundamental Dilemma of Small Data¶

Identical data combined with the same algorithm can yield models with similar accuracies but entirely different decision functions.
Models tend to learn spurious correlations (e.g., background instead of the target object).
Samples from minority classes are insufficient to learn the correct reasons for classification.

Why More Data Alone is Insufficient¶

Acquiring large amounts of high-quality annotated data is expensive.
Certain domains (e.g., rare disease diagnosis, defect detection in new products) are inherently data-scarce.
More data does not guarantee the elimination of ambiguity in decision functions.

Difference from Existing Methods¶

Existing methods (e.g., Ross et al. 2017) penalize the model for learning "wrong reasons" but do not ensure the model learns "correct reasons." This work directly provides correct explanations and guides the learning process.

Method¶

Two-Stage Alternating Training Loop¶

Stage 1: Optimize Prediction Accuracy (standard label loss)
Stage 2: Align Attention with Explanations (explanation alignment loss)

The optimization alternates until convergence, rather than relying on a simple weighted combination.

Form of Explanations¶

Simplest form: A binary mask of input features, marking which features are the key reasons for the label.

Essential Difference from Joint Loss Methods¶

Joint loss only penalizes "wrong reasons" without guiding toward "correct reasons".
This work ensures the model satisfies both objectives simultaneously through alternating optimization.
Experiments demonstrate that joint loss fails to learn correct reasons even on simple tasks.

Key Experimental Results¶

Spurious Feature Scenarios¶

Method	Accuracy with Spurious Features	Accuracy without Spurious Features	Learned Correct Reason
Standard Training	High (using spurious features)	Low	No
Ross et al. (Penalty)	Medium	Medium	Partial
Ours (Explanation Guided)	High	High	Yes

Small-Data Scenarios¶

Number of Training Samples	Standard Training	Penalty Method	Ours
50	62%	65%	78%
100	71%	73%	85%
500	82%	83%	90%

Key Findings¶

Explanation guidance significantly accelerates convergence (reaching higher accuracy in fewer epochs).
Model stability is vastly improved (reduced variance across different runs).
The improvement on minority classes is most pronounced in class-imbalanced scenarios.
Joint loss methods fail even on minimalist geometric shape detection, whereas our method succeeds.

Highlights & Insights¶

"Providing the correct reason" is more effective than "penalizing the wrong reason"—an intuitively plausible finding systematically proven by experiments for the first time.
Alternating optimization is more stable than joint loss, avoiding the need for hyperparameter tuning to balance the two objectives.
The form of explanation is extremely simple (binary mask), resulting in low acquisition costs.
The counterexample where joint loss fails even on minimalist tasks is highly convincing.
This has important practical guidance implications for trustworthy AI deployment.

Limitations & Future Work¶

Requires human-provided explanations, which increases annotation costs.
Only validated on image classification; other modalities such as text/tabular data remain to be explored.
Explanation quality directly affects effectiveness, but how to evaluate explanation quality is not discussed in depth.
The diminishing returns of the proposed method in large-data scenarios are not quantified.
Comparisons with other knowledge injection methods (e.g., curriculum learning, knowledge distillation) are insufficient.

The core difference from Ross et al. is "guidance" vs. "penalty".
Difference from Rieger et al. (RRR): RRR uses bounding boxes, whereas ours uses feature masks.
Insight: Explanation signals can be automatically generated using LLMs/GPT-4V, reducing manual annotation costs.

Rating¶

Novelty: 4.0/5 — Conceptually simple but clearly distinguished from existing methods
Experimental Thoroughness: 4.5/5 — Multiple scenarios + counterexamples + ablations
Writing Quality: 4.5/5 — Clear problem definition
Value: 4.5/5 — Direct significance for trustworthy AI deployment

Supplementary Analysis¶

Insights from Minimalist Counterexamples¶

On the task of detecting simple geometric shapes, joint loss methods fail to learn the correct regions, while alternating training converges with only 50 samples.

Cost of Obtaining Explanations¶

In expert domains such as medicine, the marginal cost of annotating explanations is far lower than acquiring new samples. This can be automatically generated using LLMs/VLMs in the future.

Relationship with Curriculum Learning¶

Explanation guidance can be viewed as a more precise form of curriculum learning: it not only tells the model what to learn but also why.