t-SNE Exaggerates Clusters, Provably¶

Conference: ICLR 2026 arXiv: 2510.07746 Code: https://github.com/njbergam/tsne-exaggerates-clusters Area: Data Visualization / Theoretical Analysis Keywords: t-SNE, cluster exaggeration, dimensionality reduction, misleading visualization, outliers

TL;DR¶

This paper provides rigorous theoretical proofs of two fundamental failure modes of t-SNE: (1) the strength of input clusters cannot be inferred from the output, and (2) extreme outliers cannot be faithfully represented — even when the input has no cluster structure or contains extreme outliers, t-SNE may produce perfectly clustered visualizations.

Background & Motivation¶

Background: t-SNE is a standard tool for exploratory data analysis, widely used in single-cell genomics, language model interpretability, and beyond.
Existing Theory: Prior work has proven that t-SNE produces cluster-preserving outputs for well-separated input clusters (true-positive guarantees).
Limitations of Prior Work: Theoretical analysis of false positives (clustered output from unstructured input) and false negatives (unstructured output from clustered input) has been absent.
Key Challenge: t-SNE outputs directly influence hypothesis generation, experimental design, and scientific conclusions, making its failure modes practically consequential.

Method¶

Formalization of t-SNE¶

The input affinity matrix $P$ is constructed via a Gaussian kernel: $$P_{j|i}(X; \sigma_i) := \frac{\exp(-\|x_j - x_i\|^2 / (2\sigma_i^2))}{\sum_{k \neq i} \exp(-\|x_k - x_i\|^2 / (2\sigma_i^2))}$$

The output affinity matrix $Q$ is based on the $t$-distribution: $$Q_{ij}(Y) := \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k,l; k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}$$

Objective: minimize $\mathcal{L}_X(Y) := \text{KL}(P(X) \| Q(Y))$

Core Finding 1: Cluster Strength Is Not Identifiable¶

Theorem 3 (Different inputs, identical outputs): For any $0 < \epsilon \leq 1$, there exists a dataset $X_\epsilon$ such that: $$\bar{\mathcal{S}}(X_\epsilon; C_{m \in [k]}) = \epsilon \cdot \bar{\mathcal{S}}(X; C_{m \in [k]})$$ yet for any perplexity $\rho$: $$\text{t-SNE}_\rho(X) = \text{t-SNE}_\rho(X_\epsilon)$$

That is, an "impostor" dataset with arbitrarily weak cluster structure can produce exactly the same t-SNE output as a strongly clustered dataset.

Corollary 4: For any balanced two-class dataset, there exists a family of datasets with silhouette coefficients ranging from $\epsilon$ to 1 that share an identical set of t-SNE stationary points.

Core Finding 2: Tiny Perturbations Cause Drastic Changes¶

Theorem 5: For any $\epsilon > 0$, there exist datasets $X, X'$ such that all pairwise distance ratios lie within $[1-\epsilon, 1+\epsilon]$ (i.e., distances are nearly identical), yet the t-SNE outputs are completely different.

Lemma 6 (Surprising result): The set $\Delta_\epsilon$ of datasets that approximately form a regular simplex suffices to generate all possible t-SNE stationary point outputs.

Key Mechanism: Additive Invariance¶

Beyond multiplicative scale invariance, t-SNE also exhibits additive shift invariance with respect to squared input distances. That is, if $\|x'_i - x'_j\|^2 = \|x_i - x_j\|^2 + C$, then $\text{t-SNE}_\rho(X) = \text{t-SNE}_\rho(X')$. This property is the fundamental cause of the failure modes described above.

Core Finding 3: Outliers Are Suppressed¶

Theorem 9: For any t-SNE output $Y$, the outlierness $\alpha(Y) \leq 3.266 + o_n(1)$.

Regardless of how extreme the outliers are in the input, t-SNE cannot represent outlierness exceeding approximately 3.6 in the output. This is caused by the asymmetry between the input and output affinity matrices.

Single-Point Poisoning Attack¶

Adding a single "poisoning point" placed at the data mean suffices to destroy the entire cluster visualization structure. This effect is particularly severe in high-dimensional data, where the poisoning point becomes the nearest neighbor of most points, drastically altering the affinity matrix.

Experimental Validation¶

Impostor Dataset Experiment¶

Metric	Original PBMC3k	Impostor Dataset
t-SNE visualization	Clear clusters	Nearly identical clusters
Silhouette coefficient	High (original)	Extremely low
Nearest-neighbor ranking	Normal	Preserved unchanged

Poisoning Attack Experiment¶

400 points × 2000-dimensional Gaussian mixture → add 1 poisoning point → cluster structure completely disappears
BBC News dataset: inject 10% poisoning points → silhouette coefficient halved
By contrast: injecting 50% outliers has almost no effect on cluster structure

Outlier Experiment¶

Dataset	$\alpha$ in t-SNE	$\alpha$ in PCA
Financial fraud data	~0.2	Separation preserved
Gaussian + outliers	~0.1	Faithfully recovered

Highlights & Insights¶

First theoretical analysis of t-SNE failure modes: Prior work offered only empirical observations; this paper provides rigorous proofs.
Discovery of additive invariance: Reveals the fundamental cause of t-SNE's misleading behavior.
Practical implications:
The strength of input clusters cannot be inferred from t-SNE visualizations.
t-SNE is unsuitable for outlier detection.
t-SNE is particularly unstable on high-dimensional data (which tends to approximate a regular simplex).
PCA as a complement: PCA significantly outperforms t-SNE in outlier detection and stability.

Limitations & Future Work¶

Theoretical results are based on stationary point analysis; actual t-SNE outputs depend on the optimization trajectory and may avoid certain stationary points.
Contributions are primarily mathematical; concrete algorithmic improvements are limited.
The paper focuses mainly on t-SNE; analysis of methods such as UMAP is only preliminary.

t-SNE theory: Arora et al. 2018 (cluster-preservation guarantees); Cai & Ma 2022 (analysis of optimization phases)
Critiques of t-SNE: Chari & Pachter 2023 (t-SNE as an unreliable exploratory analysis tool)
General dimensionality reduction theory: Snoeck et al. 2026 (any constant-dimensional embedding necessarily incurs distortion)

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First rigorous theoretical analysis of t-SNE failure modes
Technical Depth: ⭐⭐⭐⭐⭐ — Elegant proofs; the discovery of additive invariance is profound
Experimental Thoroughness: ⭐⭐⭐⭐ — Theory and experiments are tightly integrated
Writing Quality: ⭐⭐⭐⭐ — Important cautionary findings for researchers using t-SNE in practice