t-SNE Exaggerates Clusters, Provably¶
Conference: ICLR 2026 arXiv: 2510.07746 Code: https://github.com/njbergam/tsne-exaggerates-clusters Area: Data Visualization / Theoretical Analysis Keywords: t-SNE, cluster exaggeration, dimensionality reduction, misleading visualization, outliers
TL;DR¶
This paper provides rigorous theoretical proofs of two fundamental failure modes of t-SNE: (1) the strength of input clusters cannot be inferred from the output, and (2) extreme outliers cannot be faithfully represented — even when the input has no cluster structure or contains extreme outliers, t-SNE may produce perfectly clustered visualizations.
Background & Motivation¶
- Background: t-SNE is a standard tool for exploratory data analysis, widely used in single-cell genomics, language model interpretability, and beyond.
- Existing Theory: Prior work has proven that t-SNE produces cluster-preserving outputs for well-separated input clusters (true-positive guarantees).
- Limitations of Prior Work: Theoretical analysis of false positives (clustered output from unstructured input) and false negatives (unstructured output from clustered input) has been absent.
- Key Challenge: t-SNE outputs directly influence hypothesis generation, experimental design, and scientific conclusions, making its failure modes practically consequential.
Method¶
Formalization of t-SNE¶
The input affinity matrix \(P\) is constructed via a Gaussian kernel: $\(P_{j|i}(X; \sigma_i) := \frac{\exp(-\|x_j - x_i\|^2 / (2\sigma_i^2))}{\sum_{k \neq i} \exp(-\|x_k - x_i\|^2 / (2\sigma_i^2))}\)$
The output affinity matrix \(Q\) is based on the \(t\)-distribution: $\(Q_{ij}(Y) := \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k,l; k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}\)$
Objective: minimize \(\mathcal{L}_X(Y) := \text{KL}(P(X) \| Q(Y))\)
Core Finding 1: Cluster Strength Is Not Identifiable¶
Theorem 3 (Different inputs, identical outputs): For any \(0 < \epsilon \leq 1\), there exists a dataset \(X_\epsilon\) such that: $\(\bar{\mathcal{S}}(X_\epsilon; C_{m \in [k]}) = \epsilon \cdot \bar{\mathcal{S}}(X; C_{m \in [k]})\)$ yet for any perplexity \(\rho\): $\(\text{t-SNE}_\rho(X) = \text{t-SNE}_\rho(X_\epsilon)\)$
That is, an "impostor" dataset with arbitrarily weak cluster structure can produce exactly the same t-SNE output as a strongly clustered dataset.
Corollary 4: For any balanced two-class dataset, there exists a family of datasets with silhouette coefficients ranging from \(\epsilon\) to 1 that share an identical set of t-SNE stationary points.
Core Finding 2: Tiny Perturbations Cause Drastic Changes¶
Theorem 5: For any \(\epsilon > 0\), there exist datasets \(X, X'\) such that all pairwise distance ratios lie within \([1-\epsilon, 1+\epsilon]\) (i.e., distances are nearly identical), yet the t-SNE outputs are completely different.
Lemma 6 (Surprising result): The set \(\Delta_\epsilon\) of datasets that approximately form a regular simplex suffices to generate all possible t-SNE stationary point outputs.
Key Mechanism: Additive Invariance¶
Beyond multiplicative scale invariance, t-SNE also exhibits additive shift invariance with respect to squared input distances. That is, if \(\|x'_i - x'_j\|^2 = \|x_i - x_j\|^2 + C\), then \(\text{t-SNE}_\rho(X) = \text{t-SNE}_\rho(X')\). This property is the fundamental cause of the failure modes described above.
Core Finding 3: Outliers Are Suppressed¶
Theorem 9: For any t-SNE output \(Y\), the outlierness \(\alpha(Y) \leq 3.266 + o_n(1)\).
Regardless of how extreme the outliers are in the input, t-SNE cannot represent outlierness exceeding approximately 3.6 in the output. This is caused by the asymmetry between the input and output affinity matrices.
Single-Point Poisoning Attack¶
Adding a single "poisoning point" placed at the data mean suffices to destroy the entire cluster visualization structure. This effect is particularly severe in high-dimensional data, where the poisoning point becomes the nearest neighbor of most points, drastically altering the affinity matrix.
Experimental Validation¶
Impostor Dataset Experiment¶
| Metric | Original PBMC3k | Impostor Dataset |
|---|---|---|
| t-SNE visualization | Clear clusters | Nearly identical clusters |
| Silhouette coefficient | High (original) | Extremely low |
| Nearest-neighbor ranking | Normal | Preserved unchanged |
Poisoning Attack Experiment¶
- 400 points × 2000-dimensional Gaussian mixture → add 1 poisoning point → cluster structure completely disappears
- BBC News dataset: inject 10% poisoning points → silhouette coefficient halved
- By contrast: injecting 50% outliers has almost no effect on cluster structure
Outlier Experiment¶
| Dataset | \(\alpha\) in t-SNE | \(\alpha\) in PCA |
|---|---|---|
| Financial fraud data | ~0.2 | Separation preserved |
| Gaussian + outliers | ~0.1 | Faithfully recovered |
Highlights & Insights¶
- First theoretical analysis of t-SNE failure modes: Prior work offered only empirical observations; this paper provides rigorous proofs.
- Discovery of additive invariance: Reveals the fundamental cause of t-SNE's misleading behavior.
- Practical implications:
- The strength of input clusters cannot be inferred from t-SNE visualizations.
- t-SNE is unsuitable for outlier detection.
- t-SNE is particularly unstable on high-dimensional data (which tends to approximate a regular simplex).
- PCA as a complement: PCA significantly outperforms t-SNE in outlier detection and stability.
Limitations & Future Work¶
- Theoretical results are based on stationary point analysis; actual t-SNE outputs depend on the optimization trajectory and may avoid certain stationary points.
- Contributions are primarily mathematical; concrete algorithmic improvements are limited.
- The paper focuses mainly on t-SNE; analysis of methods such as UMAP is only preliminary.
Related Work & Insights¶
- t-SNE theory: Arora et al. 2018 (cluster-preservation guarantees); Cai & Ma 2022 (analysis of optimization phases)
- Critiques of t-SNE: Chari & Pachter 2023 (t-SNE as an unreliable exploratory analysis tool)
- General dimensionality reduction theory: Snoeck et al. 2026 (any constant-dimensional embedding necessarily incurs distortion)
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First rigorous theoretical analysis of t-SNE failure modes
- Technical Depth: ⭐⭐⭐⭐⭐ — Elegant proofs; the discovery of additive invariance is profound
- Experimental Thoroughness: ⭐⭐⭐⭐ — Theory and experiments are tightly integrated
- Writing Quality: ⭐⭐⭐⭐ — Important cautionary findings for researchers using t-SNE in practice