Skip to content

Large Language Models are Good Relational Learners

Conference: ACL 2025
arXiv: 2506.05725
Code: GitHub
Area: Relational Data Learning / LLM & Structured Data
Keywords: Relational Deep Learning, Graph Neural Networks, RAG, Graph Prompt Tuning, Relational Databases

TL;DR

The authors propose the Rel-LLM framework, which utilizes a GNN encoder to extract structured subgraph representations from relational databases and injects them as soft prompts into a frozen LLM. It achieves SOTA performance on relational deep learning (RDL) tasks on the RelBench benchmark and supports zero-shot prediction.

Background & Motivation

Background: LLMs perform exceptionally well in NLP, CV, information retrieval, and other fields, but still fall short in processing and reasoning over relational databases (RDBs). Approximately 73% of the world's data is stored in relational databases, where tables are interconnected via primary-foreign keys, forming complex network structures.

Limitations of Prior Work: Existing methods "flatten" relational databases into text documents to input into LLMs, which suffers from three major issues: (1) loss of relation structures between tables; (2) nested joins leading to entity redundancy; (3) serialization of large databases often exceeding the context length limits of LLMs.

Key Challenge: LLMs excel at textual reasoning but struggle with explicit relational structures, whereas GNNs excel at modeling graph structures but lack semantic understanding and generalization capabilities.

Goal: How to enable LLMs to effectively utilize structured information within relational databases while preserving relationship semantics between tables.

Key Insight: Modeling relational databases as heterogeneous graphs, encoding local subgraphs using GNNs, and mapping graph embeddings onto the latent space of the LLM via a projection layer to serve as soft prompts.

Core Idea: Utilizing GNNs to capture relational structures combined with a RAG framework to inject them into the LLM, achieving structure-aware relational reasoning.

Method

Overall Architecture

Rel-LLM consists of four components: (1) temporal-aware subgraph sampling to ensure causal consistency; (2) a heterogeneous GNN encoder to extract structural feature representations of entities; (3) a projection layer + denormalized prompt construction to organize graph embeddings into structured prompts processable by the LLM; (4) a frozen LLM that receives graph prompts and text embeddings for joint reasoning.

Key Designs

  1. Relational Entity Graph (REG): Converts the relational database into a heterogeneous graph \(G = (\mathcal{V}, \mathcal{E}, \phi, \psi)\), where each row of data is a node, and primary-foreign key relationships are edges. Node and edge types are determined by table names and relations. Initial node embeddings are generated by a multimodal column encoder.
  2. Temporal-Aware Subgraph Sampling: Centered on the target entity and using the prediction time \(t^*\) as the cutoff, only neighboring nodes with timestamps earlier than \(t^*\) are sampled to avoid temporal information leakage.
  3. Heterogeneous GraphSAGE Encoder: Employs heterogeneous GraphSAGE with sum aggregation for \(L\)-layer message passing to obtain node embeddings \(\mathbf{h}_i^{(L)}\), which are then mean-pooled to obtain subgraph-level representations \(\mathbf{h}_g^{(L)}\).
  4. MLP Projection Layer: Projects graph embeddings from the GNN space \(\mathbb{R}^{d_g}\) to the LLM hidden space \(\mathbb{R}^{d_l}\) to achieve modality alignment.
  5. Denormalized Prompt Construction: Rooted at the target entity, recursively unfolding along primary-foreign key links (breadth-first, depth \(\zeta\), maximum \(n_{\text{nest}}\) entities per layer) to organize graph embeddings of associated entities into a nested JSON structure, reducing multi-hop reasoning requirements.
  6. Three Answer Generation Strategies: (1) Pure Text Generation—directly outputs readable text; (2) Token Distribution—outputs probability distributions for probabilistic tasks; (3) MLP Transformation—uses a lightweight network to project LLM hidden representations into the task space. Different tasks are suited to different strategies.

Loss & Training

  • Masked Table Modeling: Randomly selects a portion of nodes to be masked, replacing original features with a learnable mask token, and then tasks the LLM with reconstructing the properties (column name-value pairs) of the masked entities. Column order is randomly permuted to enhance robustness.
  • The pre-training loss is a standard autoregressive NLL: \(\mathcal{L}_{\text{pretrain}} = -\frac{1}{|\mathcal{V}_{\text{mask}}|} \sum_{v_i} \sum_t \log p_\theta(y_i^{(t)} | y_i^{(<t)}, \hat{\mathbf{h}}_{\text{mask}})\)
  • Only the GNN encoder \(\phi_1\), projection layer \(\phi_2\), and mask token are optimized, while the LLM parameters \(\theta\) remain frozen.

Key Experimental Results

Experimental Setup

  • Benchmark: RelBench—contains 7 datasets, 30 prediction tasks (entity classification + entity regression)
  • Backbone LLM: Llama 3.2-1B (128K context)
  • Baselines: LightGBM, RDL (GNN + deep tabular model), ICL (LLM in-context learning), ICL+MLP

Main Results

Entity Classification (AUROC ↑):

Dataset LightGBM RDL ICL+MLP Rel-Zero Rel-LLM
rel-amazon user-churn (Test) 52.22 70.42 66.56 60.07 71.89
rel-event user-repeat (Test) 68.04 76.89 76.72 68.12 79.26
rel-stack user-engagement (Test) 63.39 90.59 87.09 69.46 91.21
Overall Average (Test) 63.66 75.83 76.83 63.42 77.82
  • Rel-LLM outperforms or matches SOTA on all datasets, achieving an average AUROC of 77.82.
  • Although the performance of zero-shot Rel-Zero is lower than the fine-tuned version, it significantly outperforms the LightGBM baseline.

Entity Regression (MAE ↓): - On tasks such as rel-hm item-sales, Rel-LLM achieves the lowest MAE. - Compared to ICL+MLP, Rel-LLM achieves a 5-15% improvement on most tasks.

Key Findings

  1. The GNN encoder effectively preserves relational structural information, avoiding information loss caused by text serialization.
  2. The graph prompt tuning approach does not require modification of LLM parameters, making training costs significantly lower than full fine-tuning.
  3. The masked table modeling during the pre-training phase endows the model with zero-shot transfer capabilities.
  4. Different tasks are suited to different answer generation strategies; classification tasks favor token distribution, while regression tasks favor MLP transformation.

Highlights & Insights

  1. Structure-Preserving RAG: Unlike traditional RAG, Rel-LLM does not retrieve text fragments but rather retrieves graph-structured subgraphs, preserving relational semantics.
  2. Efficient Fine-Tuning: Freezing the LLM and training only the GNN and the projection layer makes parameter efficiency extremely high.
  3. Denormalization to Nested JSON: Cleverly "translates" graph structures into a format understandable by LLMs, with JSON format being proven to work best for tabular data encoding.
  4. Temporal Consistency: Strictly avoids information leakage through temporal-aware sampling, making it suitable for time-series forecasting scenarios.
  5. Zero-Shot Capability: Can obtain reasonable predictions on new tasks after pre-training without requiring further fine-tuning.

Limitations & Future Work

  1. It relies on small models like Llama 3.2-1B; whether it performs better on larger models remains unverified.
  2. Denormalization depth and the number of nested levels are hyperparameters that require manual adjustment based on database structures.
  3. It cannot be directly applied to unstructured data that lacks clear primary-foreign key relationships.
  4. It has only been validated on Relbench, lacking test cases in more practical application scenarios.
  • Relational Tabular Learning: Benchmarks such as CTU, SJTUTable, and RelBench have driven deep learning research on relational data.
  • LLMs for Tabular Data: Existing methods serialize tables into text but face challenges of context length limits and loss of structural information.
  • Graph Prompt Learning: Injecting GNN embeddings as soft prompts for LLMs is a recent trend in graph-language multimodality.

Rating

⭐⭐⭐⭐ — Clever methodology design and comprehensive experiments. The study makes a meaningful exploration in the important yet overlooked direction of relational databases + LLMs. The combination of GNN + RAG + frozen LLM shows excellent scalability.

Additional Details

  • Randomly permuting column order in pre-training serves as an effective data augmentation strategy, preventing the model from only learning reconstruction under a specific column sequence.
  • The early experimental finding that JSON format outperforms Markdown and CSV for relational data encoding comes from Singha et al., 2023.
  • In some tasks, the performance of ICL+MLP is close to Rel-LLM, suggesting that the representation capabilities of LLMs can be partially unleashed even through simple text serialization.