Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy¶

Conference: CVPR 2025
arXiv: 2412.12463
Code: None
Area: Image Generation / Image Editing
Keywords: Pattern Editing, Analogical Reasoning, Programmatic Editing, Domain-Specific Language, Diffusion Models

TL;DR¶

Pattern Analogies proposes a framework for structured editing of pattern images without inferring the underlying program: users demonstrate the desired editing operation through a pair of simple patterns \((A, A')\), and the TriFuser diffusion model transfers this editing to a complex target pattern \(B\) to generate \(B'\), faithfully executing and generalizing to unseen pattern styles designed by real-world artists.

Background & Motivation¶

Background: Pattern design (tiles, wallpapers, textiles, etc.) is a fundamental element in digital media and physical products. Editing patterns typically requires modifying the underlying program parameters (e.g., tiling method, partitioning mode, color mapping) that define their structural rules.

Limitations of Prior Work: (1) Visual Program Inference (VPI) attempts to automatically infer the complete program from images, but complex patterns are often semi-parametric (a mix of rules and non-parametric elements), making inference difficult; (2) Even if successfully inferred, the generated programs are usually structurally disorganized with unlabeled parameters, making editing tedious; (3) Existing analogical editing methods using diffusion models primarily target appearance/style variations and cannot execute structured programmatic edits.

Key Challenge: Users need to "change the organizational rules of the pattern" (such as tiling methods, scaling modes), but existing methods can only perform "pixel-level modifications" or "overall style transfer" — lacking the ability to execute structured edits without knowing the underlying program.

Goal: Achieve programmatic editing of patterns through an analogy paradigm without inferring the underlying program.

Key Insight: Humans convey transformations through analogy — providing an exemplary pair \((A, A')\) to show "what changed and how it changed." Combining this analogy concept with a Domain-Specific Language (DSL) generates large-scale synthetic training data.

Core Idea: Design a SplitWeave DSL to generate synthetic pattern quadruplets \((A, A', B, B')\), where \(A \to A'\) and \(B \to B'\) undergo the exact same programmatic edit; train the TriFuser diffusion model to learn to execute the analogical edit \(f(A, A', B) \to B'\).

Method¶

Overall Architecture¶

A three-stage system: (1) SplitWeave DSL defines the pattern language and parametric operations; (2) A synthetic quadruplet sampler generates training data — applying the same editing operations to programs of two different patterns; (3) The TriFuser conditional diffusion model generates the edited result \(B\) conditioned on \((A, A', B)\).

Key Designs¶

SplitWeave Domain-Specific Language:
- Function: A programming language that supports constructing and parameterizing visual patterns.
- Mechanism: Composed of three types of operations: (1) Canvas Fragmentation — structured partitioning of the canvas (brick splitting, Voronoi splitting, etc.); (2) Fragment ID-Aware Operations — differential transformations based on fragment IDs (e.g., alternating row scaling); (3) SVG Operations — stroking, coloring, and composition. Two program samplers for pattern styles are designed: Motif Tiling Patterns (tiling designs based on repeating elements) and Split-Filling Patterns (color field designs based on canvas fragmentation and regional filling).
- Design Motivation: Patterns generated by naive DSL grammar sampling are often overly complex or inconsistent; customized samplers can generate high-quality training data, enabling the model to generalize to real-world patterns.
Synthetic Analogous Quadruplet Sampling:
- Function: Generates training data \((A, A', B, B')\), ensuring consistent editing relations between \(A \to A'\) and \(B \to B'\).
- Mechanism: Based on structure mapping theory, the same editing operation \(e\) (inserting/deleting/replacing subprograms) is applied to two independently sampled programs \(z_A, z_B\), resulting in \(z_{A'}, z_{B'}\), which are then rendered into images. The editing operation \(e\) acts at the program level, ensuring \(R(z_A, z_{A'}) = R(z_B, z_{B'})\) — the core relationship is a program-level correspondence rather than visual similarity of the patterns.
- Design Motivation: Program-level consistent editing guarantees the precision of analogical relationships in the training data, which is impossible to achieve through manual collection.
TriFuser Conditional Diffusion Model:
- Function: Generates the analogical editing result \(B'\) conditioned on three images \((A, A', B)\).
- Mechanism: Modified based on the Image Variation model. It addresses three issues: (1) Token Entanglement — introduces 3D positional encoding (2D spatial + 1D identity indicating which image) to help the model distinguish tokens of \(A, A', B\); (2) Semantic Bias — high-level semantic features from the image-text encoder might omit structural information; (3) Detail Erosion — fuses features from early and late encoder layers \(C_{\text{hl}}(P) = \text{Linear}(\text{LN}(c_{\text{high}}(P)) \cdot \text{LN}(c_{\text{low}}(P)))\) to preserve fine-grained texture details.
- Design Motivation: Naive concatenation of the three images as conditions yields poor results, as the model cannot properly interpret the analogical relationship.

Loss & Training¶

Standard LDM training loss (diffusion denoising objective). The training data consists of approximately 100K synthetic quadruplets, each containing four 512×512 images. Testing is conducted on 50 real-world patterns from Adobe Stock across 7 styles, with only 2 of these styles appearing during training.

Key Experimental Results¶

Main Results — Perceptual Study¶

Method	User Preference Rate ↑	Analogy Fidelity ↑
DIA (training-free)	12%	0.42
Analogist	18%	0.51
InstructPix2Pix	15%	0.38
TriFuser (Ours)	55%	0.82

Ablation Study (Synthetic Validation Set with GT)¶

Method	SSIM ↑	LPIPS ↓	Structural Similarity ↑
DIA	0.62	0.38	0.45
TriFuser	0.81	0.19	0.78

Key Findings¶

In the user perceptual study, 55% preferred TriFuser, significantly outperforming the runner-up, Analogist (18%).
Although trained only on 2 pattern styles, the model successfully generalized to 5 other unseen styles.
The 3D positional encoding and multi-layer feature fusion of TriFuser contributed to +12% and +8% improvements in analogy fidelity, respectively.

Highlights & Insights¶

The paradigm shift of "replacing program inference with analogy" is highly elegant — transforming the difficult VPI problem into a simpler conditional generation task.
The combined approach of DSL + synthetic data holds general applicability — and can be extended to editing other structured visual objects.
The cross-complexity transfer from simple patterns to complex patterns is highly impressive.

Limitations & Future Work¶

Currently, only two pattern styles are supported for editing; non-pattern structured editing (e.g., building facades, circuit boards) requires extending the DSL.
The representation of analogy is limited to a single editing operation; multi-edit combinations require chained applications.
Performance on highly complex, non-repeating patterns remains to be verified.
The generation quality of TriFuser is bounded by the capabilities of the base LDM.

vs VPI Methods: VPI attempts to infer the full program, which is complex and yields hard-to-use results; Pattern Analogies completely bypasses program inference.
vs DIA/Analogist: These analogical editing methods focus on appearance variations; this work is the first to achieve structured editing via analogy.
vs InstructPix2Pix: Instruction-based editing has difficulty precisely expressing programmatic operations like "changing the tiling method."

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The combination of the analogy paradigm, DSL synthetic data, and a dedicated diffusion model is highly creative.
Experimental Thoroughness: ⭐⭐⭐⭐ The perceptual study, synthetic GT validation, and style generalization tests are comprehensive.
Writing Quality: ⭐⭐⭐⭐⭐ The logical chain from problem definition to method design is clear and complete.
Value: ⭐⭐⭐⭐ Provides a new paradigm for visual program editing, with direct application potential for design tools.