Skip to content

Satisficing and Optimal Generalised Planning via Goal Regression (Extended Version)

Conference: AAAI 2026 arXiv: 2511.11095 Code: https://github.com/dillonzchen/moose Area: Model Compression Keywords: generalised planning, goal regression, policy synthesis, search pruning, planning domains

TL;DR

This paper presents the Moose planner, which synthesises generalised planning programs from training problems via goal regression. It decomposes multi-goal problems into single-goal subproblems, solves each optimally, and applies regression followed by lifting to produce a set of first-order condition-action rules. These rules support either satisficing planning (direct rule execution) or optimal planning (encoded as axioms to prune the search space).

Background & Motivation

Generalised Planning (GP) aims to synthesise programs that solve a family of related planning problems. The core objective is to amortise synthesis cost: solving a family of problems faster than invoking a general-purpose planner on each instance individually.

Many real-world domains are computationally easy to solve yet difficult to scale with existing general-purpose planners. For instance, UPS delivers over 20 million parcels daily across 200+ countries, yet state-of-the-art general planners struggle with simplified delivery problems involving as few as 100 parcels.

Existing GP methods (e.g., sketch learner, PG3, Loom) each have notable limitations:

  • High synthesis cost, with some domains failing to complete within 12 hours
  • Excessive memory consumption
  • Lack of theoretical guarantees for optimal planning

Moose is motivated by leveraging goal regression—a technique with a long history in the knowledge representation community—combined with problem relaxation (decomposing multi-goal problems into single-goal subproblems), enabling efficient generalised plan synthesis and instantiation in a three-step pipeline.

Method

Overall Architecture

Moose operates in two phases:

Synthesis Phase (Algorithm 1): 1. For each training problem, enumerate goal-atom orderings and decompose each into single-goal subproblems. 2. Solve each subgoal optimally while advancing the current state along the way. 3. Apply goal regression to each sub-plan (backpropagating from the goal state to the required preconditions). 4. Lift the resulting state–goal–action sequences by replacing concrete objects with free variables, yielding first-order rules.

Instantiation Phase: - Satisficing planning (Algorithm 3): On a new problem, attempt to ground rules in priority order and execute matching macro-action sequences until the goal is reached. - Optimal planning (Section 3.3): Encode learned rules as PDDL axioms to constrain the set of applicable actions at each search state, thereby pruning the search space.

Key Designs

  1. Moose Rule:

  2. Function: Compactly represents a generalised policy as first-order condition–action pairs.

  3. Mechanism: Each rule contains a set of free variables, state conditions (atoms the current state must satisfy), goal conditions (unsatisfied goal atoms), and an action sequence (including macro-actions).
  4. Design Motivation: Rule preconditions compactly capture families of states; lifted rules generalise across varying numbers of objects.
  5. Each rule carries a priority (cost-to-go) that determines execution order.

  6. Goal Regression + Lifting:

  7. Function: Extracts generalised condition–action rules from concrete plans.

  8. Mechanism: Regression computes the preimage of an action—the minimal set of atoms that must hold before execution—after which concrete objects are replaced by free variables.
  9. Key Property: Regression naturally eliminates atoms irrelevant to the goal (e.g., the location of a dog is ignored in a plan for transporting a cake), endowing rules with strong generalisation.

  10. Goal Independence Classification (TGI / SGI / OGI):

  11. Function: Formally classifies the degree to which goals in a planning domain can be solved independently.

  12. TGI (True Goal Independence): A greedy algorithm is effective under any goal ordering.
  13. SGI (Serialisable Goal Independence): Some goal ordering exists under which the greedy algorithm is effective.
  14. OGI (Optimal Goal Independence): Some ordering exists under which the greedy algorithm yields an optimal solution.
  15. Computational Complexity: pTGI is in P; pSGI and pOGI are NP-complete.

Loss & Training

This work does not involve neural network training. Rule synthesis is based on: - Solving each subproblem optimally using A* with the LM-cut heuristic. - A default of \(n_p = 3\) goal orderings per training problem. - Rule grounding implemented via SQLite database queries. - An optional extension: regressing over goal subsets (rather than single goals) to generate a richer set of high-quality rules.

Key Experimental Results

TGI Effectiveness on IPC Benchmarks

Statistic Value
Domains with all outputs valid (potential TGI) 13/38 (34.2%)
Domains with at least half of outputs valid 19/38 (50.0%)
Domains with at least one valid output 27/38 (71.1%)
Valid outputs across all problems 590/1184 (49.8%)

Main Results: Satisficing Planning Coverage

Domain SLearn-0 SLearn-1 SLearn-2 Lama Moose
Barman 0.0 0.0 0.0 49 90.0
Ferry 15.0 67.0 60.0 69 90.0
Gripper 59.6 50.8 33.0 65 90.0
Logistics 0.0 0.0 0.0 77 89.6
Miconic 68.8 72.6 67.8 77 90.0
Rovers 0.0 0.0 0.0 66 90.0
Satellite 0.0 29.2 34.6 89 90.0
Transport 0.0 63.0 46.8 66 90.0
Total (720) 143.4 282.6 242.2 558 719.6

Optimal Planning Coverage

Domain Blind LMcut Scorpion SymK Moose
Barman 0 0 0 12 24.6
Ferry 10 18 17 18 30.0
Gripper 9 8 7 30 27.0
Miconic 30 30 30 30 30.0
Total (240) 93 119 140 154 183.0

Ablation Study: Effect of Joint Goal Regression

Configuration Description
\(n_r = 1\) (single-goal regression) Base configuration; fast synthesis.
\(n_r = 2\) (2-subset goal regression) Improved solution quality in 7/8 domains (1.1%–25.0%), at higher synthesis and instantiation cost.
\(n_r = 3\) (3-subset goal regression) Marginal improvement over \(n_r = 2\).

Key Findings

  • Moose solves nearly all classical and numeric planning problems (719.6/720), substantially outperforming Lama (558) and the best SLearn configuration (282.6).
  • Synthesis efficiency: Moose is faster than the best SLearn configuration in 5/8 domains and consistently uses less memory (<1 GB vs. up to 7.6 GB).
  • Empirical optimality: Although no theoretical guarantee of optimality is given, Moose empirically produces optimal plans in all observed cases.
  • Encoding rules as axioms generally outperforms vanilla SymK: the benefit of search space reduction exceeds the overhead of axiom evaluation.
  • Solution quality: Moose outperforms Lama and ENHSP in 5/8 classical domains and 3/4 numeric domains.

Highlights & Insights

  • Elegant yet effective pipeline: Decompose goals → solve optimally → regress → lift. This four-step process synthesises generalised policies with conceptual clarity.
  • Theoretical guarantees: Completeness for satisficing planning is proved under the TGIC condition; completeness for optimal planning is proved under TGIC + OGI.
  • Planning states as databases, rules as queries: SQLite is leveraged for efficient rule grounding, elegantly reusing mature database optimisation infrastructure.
  • Identification of ESHO domains (Easy-to-Solve, Hard-to-Optimise): Domains that are P-time solvable but NP-hard to optimise are precisely where Moose excels.
  • Generalisation to numeric planning: The same framework extends naturally to numeric planning domains (NFerry, NMiconic, NMinecraft, NTransport).

Limitations & Future Work

  • The TGI assumption does not hold in many planning domains (e.g., the Sussman Anomaly in Blocksworld), limiting the method's applicability.
  • The training set size required for theoretical guarantees is exponential in domain size in the worst case.
  • The greedy decomposition strategy may fail when goals exhibit complex interdependencies (e.g., mutually exclusive goals).
  • The number of rules grows rapidly with \(n_r\), causing instantiation cost to increase by an order of magnitude.
  • Domains requiring transitive closure computation are not handled.
  • Synthesis relies on an optimal planner (A* + LM-cut), limiting efficiency when subproblems are themselves hard to solve optimally.
  • Loom (Illanes & McIlraith, 2019): Generates quantified problems via equivalent-object bagging and synthesises policies with a FOND planner.
  • PG3 (Yang et al., 2022): Performs heuristic search over the space of generalised policies, also employing goal regression to handle "missing" states.
  • Sketch Learner (Drexler et al., 2022): Learns sketches (abstract descriptions) at higher synthesis cost.
  • Long history of goal regression: From STRIPS (Fikes et al., 1972) to Waldinger (1977) to Reiter (2001), Moose revitalises this classical technique in a bottom-up manner.
  • Insight: Combining classical formal methods from planning with modern data-driven policy learning can yield significant advances on practical benchmarks.

Rating

  • Novelty: ⭐⭐⭐⭐ — Combining goal regression with goal decomposition for generalised plan synthesis is novel; the theoretical analysis (complexity classification of TGI/SGI/OGI) is rigorous.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Covers 8 classical and 4 numeric domains, evaluates both satisficing and optimal planning, and assesses synthesis cost, instantiation cost, and solution quality.
  • Writing Quality: ⭐⭐⭐⭐ — The theoretical sections are rigorous and the algorithmic pseudocode is clear, though the paper is lengthy and requires careful reading.
  • Value: ⭐⭐⭐⭐ — Substantially advances the state of the art on ESHO domains and addresses a practical scalability challenge for general-purpose planners.