GloCTM: Cross-Lingual Topic Modeling via a Global Context Space¶

Conference: AAAI 2026 arXiv: 2601.11872 Code: https://github.com/tienphat140205/GloCTM Area: Multilingual Translation Keywords: Cross-lingual, Topic Model, VAE, Multilingual Embeddings, CKA Alignment

TL;DR¶

This paper proposes GloCTM, a dual-path VAE architecture (local language path + global context path) that enforces cross-lingual alignment at four levels—Polyglot Augmentation (cross-lingual neighbor-based input expansion), KL divergence internal alignment, unified decoder structural alignment, and CKA semantic alignment—achieving state-of-the-art topic quality and cross-lingual alignment on three cross-lingual datasets.

Background & Motivation¶

Background: Cross-lingual topic models (CLTMs) aim to discover semantically aligned shared topics from multilingual documents. Early approaches relied on scarce parallel corpora; subsequent methods shifted to bilingual dictionaries (MCTA, NMTM, InfoCTM).

Limitations of Prior Work: (a) Limited dictionary coverage leads to poor topic alignment; (b) each language learns topic distributions (\(\theta\)) and topic-word distributions (\(\beta\)) in independent spaces, bridged only indirectly via auxiliary losses—this architectural decoupling is the root cause of non-robust alignment; (c) rich semantic signals from multilingual pretrained models are ignored; (d) directly aligning topic-word distributions as in NMTM tends to cause topic degeneration/collapse. For instance, InfoCTM exhibits "semantic drift" on English-Japanese data, where the same topic index corresponds to "video games" in English but "footwear" in Japanese.

Key Challenge: Existing methods treat cross-lingual alignment as an auxiliary external constraint, while the topic spaces of different languages remain fundamentally decoupled, precluding intrinsic alignment.

Goal: To structurally guarantee cross-lingual alignment at every layer of model design—input, encoding, decoding, and semantic space.

Key Insight: Cross-lingual information is injected at the input stage via Polyglot Augmentation, rather than attempting alignment after independent learning.

Core Idea: By enforcing cross-lingual alignment at four levels—input (Polyglot Augmentation), encoding (KL divergence), decoding (unified \(\beta\) matrix), and semantics (CKA)—alignment is transformed from an "external constraint" into an "intrinsic property."

Method¶

Overall Architecture¶

A dual-path VAE architecture: the local path processes the bag-of-words (BoW) vector \(\mathbf{x}_d^{(l)}\) for each language, while the global path processes the cross-lingually augmented global BoW vector \(\mathbf{g}_d^{(l)}\). The two paths infer \(\theta_{local}\) and \(\theta_{global}\) respectively and are aligned via KL divergence. The decoder employs a unified concatenated matrix \(\beta^{(global)} = [\beta^{(1)} | \beta^{(2)}]\).

Key Designs¶

Polyglot Augmentation (Cross-Lingual Word Expansion):
- Function: Constructs cross-lingually comparable document representations at the input layer.
- Mechanism: For each active word \(w \in W_d^{(l)}\) in a document, the top-\(k\) monolingual neighbors \(N_I(w)\) and cross-lingual neighbors \(N_C(w)\) are retrieved via multilingual embeddings. The augmented BoW is concatenated into a global vector \(\mathbf{g}_d^{(l)} \in \mathbb{R}^{|V^{(1)}|+|V^{(2)}|}\).
- Design Motivation: In conventional BoW, vocabularies across languages are disjoint, preventing the global encoder from directly observing cross-lingual semantic similarity. After augmentation, documents on the same topic (e.g., "football") in different languages automatically share overlapping features (soccer, goal, stadium), making alignment an intrinsic property of the input rather than a learning challenge.
KL Divergence Internal Consistency Constraint:
- Function: Aligns the latent topic distributions of the local and global paths.
- Mechanism: \(\mathcal{L}_{KL} = KL(q(z_d^{(l,\text{local})}|x_d^{(l)}) \| q(z_d^{(l,\text{global})}|g_d^{(l)}))\)
- Design Motivation: Prevents the dual paths from learning divergent latent spaces, pulling language-specific representations toward the shared global space.
Unified Decoder (Topic Synchronization):
- Function: Structurally enforces cross-lingual topic alignment.
- Mechanism: The global decoder's topic-word matrix is the horizontal concatenation of the two language matrices, \(\beta^{(global)} = [\beta^{(1)} | \beta^{(2)}]\)—each topic \(k\) is a single continuous vector spanning the joint vocabulary.
- Design Motivation: If the two halves of the same row represent different semantics (e.g., "Food" in English, "Sports" in Chinese), the reconstruction loss will be high—forcing the decoder to ensure that the same topic row corresponds to the same concept in both languages.
CKA Semantic Knowledge Distillation:
- Function: Injects deep semantic knowledge from multilingual pretrained models into the topic space.
- Mechanism: \(\mathcal{L}_{CKA} = 1 - CKA(\Theta, E)\), aligning the geometric structure of the \(K\)-dimensional topic proportion matrix with the \(M\)-dimensional PLM embedding matrix via Centered Kernel Alignment.
- Design Motivation: The topic space and the embedding space differ greatly in dimensionality (\(K\) vs. \(M\)). CKA addresses this mismatch by comparing the structural similarity of Gram matrices rather than raw vectors.

Loss & Training¶

Overall objective: \(\min_\Phi \mathcal{L} = \mathcal{L}_{VAE}^{(global)} + \sum_{l} \mathcal{L}_{VAE}^{(l,local)} + \lambda_1 \mathcal{L}_{KL} + \lambda_2 \mathcal{L}_{CKA}\)

Key Experimental Results¶

Main Results¶

Model	EC News TQ↑	Amazon Review TQ↑	Rakuten Amazon TQ↑
NMTM	0.023	0.029	0.007
InfoCTM	0.041	0.034	0.028
XTRA	0.070	0.050	0.027
GloCTM	0.070	0.056	0.037

GloCTM achieves the best or tied-best Topic Quality (TQ = CNPMI × TU) across all three datasets.
The largest margin is observed on Rakuten Amazon: TQ 0.037 vs. InfoCTM 0.028 (+32%).

Ablation Study¶

Configuration	CNPMI	TU	EN-C Cls.	ZH-C Cls.
NMTM (baseline)	0.045	0.643	0.592	0.575
w/o \(\mathcal{L}_{KL}\)	0.058	0.949	0.708	0.640
w/ \(\mathcal{L}_{sim}\) replacing CKA	—	—	—	—
Full GloCTM	Best	Best	Best	Best

Key Findings¶

All four alignment mechanisms are indispensable: removing KL divergence degrades cross-lingual classification performance; removing CKA reduces the semantic depth of topics.
Polyglot Augmentation is the foundation of alignment effectiveness—it reframes alignment from a "learning task" to an "input property."
The structural constraint of the unified decoder is the most thorough: inconsistent semantics across the two halves are directly penalized by the reconstruction loss.
A slight decrease in TU (topic diversity) is a side effect of GloCTM's semantic compactness—each topic row tends to concentrate on meaningful core vocabulary.

Highlights & Insights¶

The systematic four-level alignment design is highly compelling: from input injection → encoding alignment → decoding enforcement → semantic distillation, each level incorporates an explicit alignment mechanism—more fundamental than indirect alignment via auxiliary losses as in InfoCTM.
Polyglot Augmentation is the central innovation: by enabling cross-lingual documents to share features at the input level, the global encoder directly "observes" semantic similarity rather than compensating post hoc. This idea is transferable to any scenario requiring cross-modal alignment.
Applying CKA to topic modeling is novel, resolving the incompatibility between the \(K\)-dimensional topic space and the \(M\)-dimensional embedding space.

Limitations & Future Work¶

Validation is limited to bilingual settings; scalability to multilingual (3+) scenarios remains to be explored.
Polyglot Augmentation depends on the quality of multilingual word embeddings; performance may degrade for low-resource languages.
CKA distillation incurs computational overhead that grows with the number of documents (Gram matrix computation); sampling strategies may be required for large-scale corpora.
The number of topics \(K\) must be specified in advance; automatic selection has not been explored.

vs. InfoCTM: InfoCTM aligns indirectly via mutual information maximization and contrastive learning, yet still operates in independent language spaces; GloCTM constructs a shared space from the input stage, achieving more thorough alignment.
vs. XTRA: XTRA combines BoW with multilingual embeddings and dual contrastive alignment, achieving performance close to GloCTM; however, XTRA lacks the structural constraint of a unified decoder.

Rating¶

Novelty: ⭐⭐⭐⭐ The four-level alignment combined with Polyglot Augmentation is systematic and novel; the construction of the global context space offers valuable methodological insights.
Experimental Thoroughness: ⭐⭐⭐⭐ Three multilingual datasets, ablation studies, and LLM-based evaluation covering both topic quality and cross-lingual alignment.
Writing Quality: ⭐⭐⭐⭐⭐ Architecture diagrams are clear; motivations are logically derived; the necessity of each alignment level is supported by experimental evidence.
Value: ⭐⭐⭐ Cross-lingual topic modeling is a relatively niche area, but the methodology for constructing global context spaces is generalizable to multilingual information retrieval and cross-cultural content analysis.

Supplementary Notes¶

The local-global alignment paradigm of the dual-path VAE is transferable to cross-modal topic discovery (e.g., image-text topic alignment); the global context space idea has considerable extension potential.