DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning¶
Conference: NEURIPS2025 arXiv: 2510.17489 Code: heyongxin233/DETree Area: Object Detection Keywords: AI text detection, human-AI collaborative text, hierarchical representation learning, contrastive learning, out-of-distribution generalization
TL;DR¶
This paper proposes DETree, a framework that constructs a Hierarchical Affinity Tree (HAT) to model the hierarchical relationships among diverse human-AI collaborative text generation processes, and designs a Tree-Structured Contrastive Loss (TSCL) to align the representation space. DETree achieves significant advantages in mixed-text detection and OOD generalization scenarios.
Background & Motivation¶
With the widespread adoption of LLMs, AI involvement in text generation has become increasingly diverse: humans write and AI polishes, AI generates and humans revise, multiple AI models collaborate, etc. Different contexts have varying tolerances for AI involvement (e.g., acceptable in advertising copy, strictly prohibited in academic writing). Therefore, detection methods must not only determine whether a text involves AI, but also identify the specific mode and degree of AI participation.
Existing methods have notable limitations:
- Binary classification methods: Distinguish only between purely human-written and purely AI-generated text, unable to handle the complexity of human-AI collaboration
- Coarse-grained estimation: Such as regressing AI involvement degree or classifying content/style features based on predefined prompts, with limited expressive capacity
- Weak generalization: Perform well on training distributions but suffer severe performance degradation in OOD scenarios
The authors' key observation is that texts produced by different generation processes naturally form hierarchically clustered structures in representation space. For example, "Llama3_polish_GPT-4o" is more similar to "Claude3.5_paraphrase_GPT-4o" than to "human_polish_Gemini1.5", and both are more similar to each other than to purely human-written text.
Core Problem¶
- How to structurally model the intrinsic relationships among diverse generation processes of human-AI collaborative texts?
- How to construct a large-scale benchmark dataset covering diverse human-AI collaboration patterns?
- How to maintain strong generalization in OOD scenarios (new models, new domains)?
Method¶
1. RealBench Dataset Construction¶
Raw samples from MAGE, M4, TuringBench, OUTFOX, and RAID are aggregated. Mixed-text construction strategies including paraphrasing, extension, polishing, and translation are introduced, and 11 perturbation-based attack types are integrated. The final dataset covers 1,204 text categories with approximately 16.4 million text samples.
2. Hierarchical Affinity Tree (HAT) Construction¶
- Step 1: Fine-tune the encoder using supervised contrastive learning, treating each category as an independent class, and compute the inter-class similarity matrix \(E \in \mathbb{R}^{N \times N}\)
- Step 2: Apply agglomerative hierarchical clustering on the similarity matrix to generate an initial binary tree structure
- Step 3: Introduce an editable top-down subtree reorganization algorithm. Three prior configurations are predefined (mixed texts assigned to human/AI/independent categories). For each subtree, all possible partitions are enumerated and the optimal partition is selected via Silhouette Score; the process is applied recursively until a stopping criterion is met
- Computational complexity is \(\mathcal{O}(N^2 \log N)\)
3. Tree-Structured Contrastive Loss (TSCL)¶
The core idea is that categories sharing a closer common ancestor in the HAT should be more similar in the representation space.
For a leaf node \(c\) at depth \(d_c\), the hierarchical partition set at level \(i\) is defined as:
Based on the equivalence established in Theorem 3.1, the hierarchical similarity constraint is converted into an optimizable contrastive learning objective. At each level \(i\), the positive set \(P_i = H_c^{(i)}\) and negative set \(N_i = \bigcup_{j=i+1}^{d_c} H_c^{(j)}\) are defined. The full TSCL is:
4. Virtual Class Prototype (VCP)¶
Given the large number of categories (1,204), a single mini-batch cannot cover all classes. A learnable prototype vector \(\mathbf{v}_c \in \mathbb{R}^d\) is introduced for each category as a persistent anchor in contrastive learning, without introducing additional memory overhead.
5. Inference and OOD Adaptation¶
- K-Means Database Compression: K-means clustering is applied per category to generate compact representatives, reducing retrieval cost while maintaining balanced category representation
- Retrieval-Based Few-Shot Adaptation: A small number of target-domain samples are used to reconstruct the retrieval database and adjust decision boundaries to accommodate domain shift
Implementation Details¶
- Encoder: RoBERTa-large + LoRA fine-tuning
- Optimizer: AdamW with cosine annealing, initial learning rate 3e-5, 2000-step linear warmup
- Training: 10 epochs, 8×RTX 4090, batch size 64, maximum input length 512
- Inference: Faiss-GPU for accelerated K-means and KNN, representations taken from layers 17–19, k=5 or 50
- Temperature parameter \(\tau = 0.07\)
Key Experimental Results¶
Supervised Detection¶
| Method | MAGE AvgRec | M4-mono AvgRec | TuringBench AvgRec | Avg. AvgRec |
|---|---|---|---|---|
| DeTeCTive | 96.15 | 98.44 | 99.74 | 96.94 |
| DETree (prior1) | 96.87 | 99.86 | 99.74 | 97.88 |
OOD Generalization (AUROC)¶
| Dataset | Binoculars | MAGE | DETree zero-shot | DETree 10-shot |
|---|---|---|---|---|
| MAGE-Unseen | 96.84 | 95.20 | 99.13 | 99.81 |
| MAGE-Paraphrase | 75.87 | 83.35 | 92.66 | 98.77 |
| DetectRL-MultiDomain | 83.95 | 86.67 | 98.94 | 99.88 |
| Beemo-GPT4o-Edited | 78.15 | 67.79 | 83.79 | 88.54 |
Mixed-Text Detection (HART, AUROC)¶
| Detector | Level-1 ALL | Level-2 ALL | Level-3 ALL |
|---|---|---|---|
| HART(Binoculars) | 0.838 | 0.848 | 0.883 |
| DETree (prior1) | 0.998 | 0.992 | 0.988 |
99.32% of triplets satisfy the hierarchical similarity constraint defined in Theorem 3.1, validating the effectiveness of TSCL.
Highlights & Insights¶
- Novel hierarchical modeling paradigm: Text detection is elevated from flat classification to tree-structured hierarchical representation learning; HAT autonomously captures relationships among text sources (e.g., clustering by model family or operation strategy)
- Solid theoretical foundation: Theorem 3.1 proves the equivalence of the hierarchical similarity constraint, providing a theoretical basis for the TSCL design
- Strong OOD generalization: Via retrieval-based few-shot adaptation, only 5–10 target-domain samples suffice to substantially improve cross-domain detection performance (+15.55 AUROC on MAGE-Paraphrase)
- Large-scale dataset: RealBench covers 1,204 categories and 16.4 million samples, constituting the most comprehensive human-AI collaborative text benchmark to date
- Strong robustness: Without adversarial training, DETree still outperforms adversarially trained RoBERTa-large in most attack scenarios
- Interesting findings: AI traces in mixed texts are more salient than human features; human editing does not fundamentally alter the underlying AI characteristics of the text
Limitations & Future Work¶
- Adversarial evasion via model fine-tuning has not been explored
- The current focus is on mixed texts involving up to three authors (two AIs or one AI and one human); scenarios with more collaborators remain to be studied
- When encountering entirely unseen rare domains, a small number of target-domain samples are still required to adjust decision boundaries; fully zero-shot generalization is not yet achievable
- The paper's classification under object detection is debatable; it substantively belongs to NLP/AI safety
Related Work & Insights¶
| Dimension | Traditional Methods | HART | DETree |
|---|---|---|---|
| Classification Granularity | Binary | Three-level risk | Hierarchical category tree |
| Modeling Approach | Flat features | Content/style decoupling | Tree-structured contrastive learning |
| OOD Generalization | Weak | Moderate | Strong (few-shot adaptation) |
| Data Scale | Tens of thousands | Hundreds of thousands | Tens of millions (RealBench) |
Distinction from Hierarchical Text Classification (HTC): In HTC, label hierarchies are predefined and fixed, whereas in mixed-text detection, labels are inherently ambiguous and dynamically evolving; HAT construction is a data-driven adaptive process.
The hierarchical modeling paradigm is generalizable to other tasks requiring fine-grained category relationship modeling, such as malicious code family detection and deepfake video provenance tracing. The inference paradigm of K-Means database compression combined with KNN retrieval holds general value for large-scale classification scenarios. The few-shot retrieval adaptation method provides a new viable direction for training-based detectors in OOD settings. The dataset construction strategy of RealBench (compositional category expansion) is reusable for other research requiring large-scale multi-category datasets.
Rating¶
- Novelty: 8/10 — HAT and TSCL are pioneering contributions in the AI text detection domain
- Experimental Thoroughness: 9/10 — Covers supervised detection, OOD generalization, mixed-text detection, robustness, and compression; ablation studies are comprehensive
- Writing Quality: 8/10 — Well-structured, with complete theoretical derivations and information-rich figures and tables
- Value: 8/10 — The method is well-designed and practically applicable; the dataset contribution is significant and directly relevant to real-world AI text auditing