PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation¶
Conference: CVPR 2025
arXiv: 2505.07843
Code: https://thekinsley.github.io/PosterO/
Area: LLM Evaluation
Keywords: Content-aware layout, layout tree, SVG, LLM in-context learning, poster design
TL;DR¶
PosterO is proposed to structure poster layouts into SVG layout trees. By vectorizing design intents and modeling hierarchical node representations, it interfaces with LLMs, generating high-quality content-aware layouts via intent-aligned in-context learning. It achieves state-of-the-art performance across multiple benchmarks and introduces the first PStylish7 dataset supporting multi-purpose and multi-shape elements.
Background & Motivation¶
Background: Content-aware layout generation, which automatically arranges text and visual elements based on input images, is a core technology for design automation in posters, advertisements, and other media. Existing methods rely on GANs, autoregressive models, or diffusion models, optimizing via image-centric strategies such as saliency enhancement under limited training data.
Limitations of Prior Work: (1) Image-centric enhancement strategies do not scale layout diversity, easily trapping the models in local solution spaces. (2) Existing LLM-based methods use monotonous rectangular element representations that lack semantic richness, making them unable to handle diverse shapes such as circles and curves. (3) The relationship between design intents (placeable regions) and layout elements has not been explicitly modeled.
Key Challenge: Existing layout representation methods are semantically impoverished, failing to fully leverage the implicit layout design knowledge within LLMs.
Goal: Construct semantically rich layout representations, enabling LLMs to both comprehend image constraints and generate diverse layouts.
Key Insight: Represent both layout elements and design intents using the SVG language to form a hierarchical layout tree, which naturally aligns with the text-understanding capabilities of LLMs.
Core Idea: Uniformly encode layouts, design intents, and element hierarchical relationships into SVG layout trees, allowing LLMs to directly generate layouts via intent-aligned in-context learning.
Method¶
Overall Architecture¶
PosterO comprises three phases: (a) Layout Tree Construction — converting the layout-image pairs in the dataset into SVG layout trees; (b) Layout Tree Generation — selecting intent-aligned exemplars for LLM in-context learning given a test image; (c) Poster Design Realization — continuing the dialogue with the LLM based on the generated layout tree to fill in actual design assets.
Key Designs¶
-
Universal Shape Vectorization:
- Function: Uniformly encode layout elements of various shapes into SVG nodes.
- Mechanism: Five basic SVG shapes are defined to cover common elements in posters: standard rectangle
<rect>, portrait rectangle, rotated rectangle (via transform rotate), ellipse<ellipse>, and complex paths<path>approximated by multiple cubic Bézier curves. This significantly extends expressiveness compared to existing methods that only support rectangles represented by the tuple(x, y, w, h). - Design Motivation: Real posters contain diverse shapes like circular buttons and curved text boxes, which a single rectangle format cannot accommodate.
-
Design Intent Vectorization and Alignment Selection:
- Function: Encode "regions suitable for elements" within the image as a part of the layout tree, and select in-context exemplars based on intent embeddings.
- Mechanism: A U-Net-based design intent detection model \(\mathcal{S}\) is trained to take an image as input and output a heatmap of intent regions, which are then converted into
<polygon>SVG nodes through contour approximation. Concurrently, intermediate features from its encoder are extracted as intent embeddings. During inference, a nearest neighbor search selects \(k\) training samples with the most similar intents as ICL exemplars. - Design Motivation: LLMs cannot directly "see" many images; design intent nodes convert visual constraints into text information transmitted to the LLM. Intent-aligned exemplar selection ensures that the layout patterns of the exemplars are consistent with the available space of the test image.
-
Hierarchical Node Representation:
- Function: Explicitly model the containment relationships between elements (e.g., an underlay containing text).
- Mechanism: Layout elements are sorted by area. After detecting containment relationships, SVG subtrees are constructed, where the coordinates of the contained elements are converted into offsets relative to their containing element. Each leaf node is assigned a unique
id. - Design Motivation: Plain colored background blocks containing text is a common design pattern in posters. Hierarchical representation enables the LLM to understand and generate such nested structures.
Loss & Training¶
PosterO is fundamentally based on LLM in-context learning (ICL) and does not require training the LLM. Only the design intent detection model \(\mathcal{S}\) requires semi-supervised training. During inference, the LLM is directly prompted to generate new layout trees using carefully constructed prompts (containing \(k\) exemplar layout trees and the intent description of the test image).
Key Experimental Results¶
Main Results¶
| Method | CGL FID↓ | DS FID↓ | CGL Occ↓ | DS Occ↓ |
|---|---|---|---|---|
| CGL-GAN | 60.18 | 73.66 | 0.218 | 0.299 |
| RALF | 42.18 | 48.87 | 0.208 | 0.288 |
| PosterLlama | 38.81 | 41.93 | 0.193 | 0.210 |
| PosterO | 30.55 | 37.52 | 0.153 | 0.193 |
PosterO comprehensively outperforms existing methods across both CGL and DS benchmarks.
Ablation Study¶
| Configuration | FID↓ | Occ↓ |
|---|---|---|
| W/o design intent nodes | 45.2 | 0.221 |
| Random exemplar selection | 38.7 | 0.198 |
| Intent-aligned selection (Full model) | 30.55 | 0.153 |
Key Findings¶
- Design intent nodes contribute the most to performance — eliminating them degenerates FID by approximately 48%.
- Intent-aligned exemplar selection significantly outperforms random selection, validating the importance of observing similar spatial layouts for the LLMs.
- PosterO significantly outperforms existing methods in cross-domain adaptation and spatial distribution shifts.
- It adapts to LLMs of various sizes (GPT-4, Llama, etc.), where even small-scale LLMs can obtain reasonable results.
Highlights & Insights¶
- SVG as a Bridge Between LLMs and Layout Design: Formulates layout generation as a structured text generation problem, fully leveraging the understanding of code and markup languages in LLMs.
- Explicit Encoding of Design Intent: Converting visual information regarding "where elements can be placed" into text nodes presents an elegant solution to address the limitation of LLMs in processing visual inputs.
- Zero-Shot Poster Realization: After generating the layout tree, the LLM can be directly requested to fill in design assets within the same dialogue session, demonstrating the design knowledge embedded in LLMs.
Limitations & Future Work¶
- It relies on the SVG generation capability of the LLMs, and the generation of complex paths may be unstable.
- The design intent detection model needs to be trained separately for each poster application.
- The scale of the PStylish7 dataset is relatively small (152 + 100), leading to insufficient large-scale verification.
- Comparative evaluations against diffusion-based methods are not fully explored.
Related Work & Insights¶
- vs LayoutPrompter: Albeit also utilizing ICL, LayoutPrompter extracts only coarse rectangular constraints, whereas PosterO provides richer semantics via layout trees.
- vs PosterLlama: PosterLlama utilizes SVG but requires fine-tuning the LLM, whereas PosterO uses ICL to avoid fine-tuning overhead and catastrophic forgetting.
- The concept of layout trees can be transferred to correlated domains such as UI design and document typesetting.
Rating¶
- Novelty: ⭐⭐⭐⭐ Elegant layout tree representation and intent-aligned ICL scheme.
- Experimental Thoroughness: ⭐⭐⭐⭐ Multi-benchmark evaluation including cross-domain and generalized settings.
- Writing Quality: ⭐⭐⭐⭐ Clear structure and rich illustrations.
- Value: ⭐⭐⭐⭐ High practicality, advancing layout generation towards generalized scenarios.
title: >- [Paper Reading] PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation description: >- [CVPR 2025][LLM/NLP][Layout Generation] This paper proposes PosterO, a layout-centric poster generation method that structures layouts from datasets into a hierarchical tree representation in SVG. Through three major mechanisms—universal shape vectorization, design intent vectorization, and hierarchical node descriptions—it enables LLMs to generate diverse content-aware layouts at inference time via in-context learning. tags: - CVPR 2025 - LLM/NLP - Layout Generation - Poster Design - LLM - SVG Tree - in-context learning