Cover Learning for Large-Scale Topology Representation¶

Conference: ICML2025
arXiv: 2503.09767
Code: github.com/luisscoccola/shapediscover
Area: NLP Understanding
Keywords: Cover Learning, Topological Data Analysis, Simplicial Complex, Nerve Construction, Fuzzy Cover

TL;DR¶

Proposes Cover Learning as a unified unsupervised learning problem. From an optimization perspective, three loss functions (measure, geometry, topology) are designed to learn topologically faithful covers of datasets. The resulting simplicial complexes are more compact than standard geometric complexes in topological inference and can represent higher-dimensional information than Mapper graphs in large-scale topological visualization.

Background & Motivation¶

Background: The core goal of Topological Data Analysis (TDA) is to infer and quantify the topological structure of underlying spaces from data. Current TDA mainly consists of two methodologies: (1) topological inference based on geometric complexes (such as Vietoris-Rips complexes), which can recover high-dimensional topology but the number of simplices grows exponentially with the size of data; (2) large-scale topological visualization based on Mapper graphs, which is flexible and easy to use, but the output is a 1D graph and cannot capture topological information beyond dimension 1. Both share a common core in topology known as the Nerve construction—given a cover of a space, a simplicial complex is constructed.

Limitations of Prior Work: Geometric complexes contain \(\Theta(|X|^{d+1})\) \(d\)-dimensional simplices, incurring extremely high computational costs on large datasets. The output of Mapper graphs is intrinsically one-dimensional (the nerve is at most a graph), losing all topological information above dimension 1. Furthermore, Mapper has three hard-to-tune hyperparameters (filter function, interval cover, and clustering algorithm), and improper selection severely degrades the results.

Key Challenge: Existing methods either recover high-dimensional topology with an exploding complex size (geometric complexes) or have a controllable size but can only represent low-dimensional information (Mapper). There is a lack of a unified method that simultaneously satisfies both compactness and faithful high-dimensional topology.

Goal: How to directly learn a cover of a dataset (rather than relying on preset filter functions or global distance scales) such that the simplicial complex obtained through Nerve construction is both compact and topologically faithful?

Key Insight: The authors observe that various methods in TDA (e.g., Mapper, Ball Mapper) are essentially cover learning algorithms, but they have never been unified conceptually. This paper independently proposes Cover Learning as an optimization problem, smooths the search space using continuous fuzzy covers, and derives computable loss functions for gradient optimization.

Core Idea: Treats cover learning as an optimization problem, directly learning a topologically faithful cover via continuous parameterization of fuzzy covers and gradient descent of three loss functions (measure, geometry, and topology).

Method¶

Overall Architecture¶

The pipeline of the ShapeDiscover algorithm: the input is a point cloud \(X \subseteq \mathbb{R}^N\), and the output is a fuzzy cover and its Nerve simplicial complex.

Input: Point cloud data \(X\)
Graph Construction: Construct a weighted graph \(G\) using the weighted neighborhood graph from UMAP
Initialization: Perform spectral clustering on \(G\) to obtain an initial cover
Parameterization: Continuous-ize the cover into a fuzzy cover \(g: X \to \Gamma^{k-1}\), parameterized via softmax + \(\ell^\infty\) normalization
Optimization: Perform Adam gradient descent on the four losses \(\hat{\mathsf{M}} + \text{reg} \cdot \hat{\mathsf{G}} + \hat{\mathsf{T}} + \text{reg} \cdot \hat{\mathsf{R}}\)
Output: Threshold the fuzzy cover to obtain a cover, and construct the Nerve to get the simplicial complex

Key Designs¶

Continuous Parameterization of Fuzzy Covers
- Function: Embeds the discrete cover space into a continuous function space to enable optimization
- Mechanism: Defines \(\Gamma^{k-1} = \{p \in [0,1]^k : \max_i p_i = 1\}\) as the target space, where a fuzzy cover is a function \(g: X \to \Gamma^{k-1}\). For any threshold \(\lambda \in [0,1]\), the superlevel set \(\{g > \lambda\}\) yields a true cover. By randomly sampling the threshold, a fuzzy cover induces a probability measure \(\mu_g\) over covers, allowing the expected loss to be formulated as an explicit integral with respect to \(g\).
- Design Motivation: Since the space of covers is discrete and lacks a smooth structure, direct gradient descent is infeasible. Fuzzy covers smooth the search space, acting as a crucial bridge from discrete to continuous optimization.
Three Computable Loss Functions (Theorem 3.2)
- Function: Translates the three formal goals of Cover Learning (small measure, regular boundaries, and topological faithfulness) into computable and differentiable losses
- Mechanism: The main theorem proves that on manifolds: \(\mathsf{M}(g) = \sum_i \|g_i\|_1^2\) (measure term = sum of squared \(L^1\) norms of each component), \(\mathsf{G}(g) = \sum_i \|\nabla g_i\|_1^2\) (geometry term = sum of squared \(L^1\) norms of gradients), and \(\mathsf{T}(g) \leq \sum_{J \subseteq I} \|\min_{j \in J} g_j\|_{\mathsf{H}}^2\) (topology term \(\leq\) upper bound of persistent homology length). The corresponding discrete estimators are defined on a weighted graph. The topology term is optimized using 0D persistent homology (supporting automatic differentiation).
- Design Motivation: Directly optimizes the conditions of the Nerve theorem—the measure term ensures that cover elements are not too large, the geometry term ensures smooth boundaries, and the topology term ensures that the Nerve recovers the correct homology. This is the core theoretical contribution of "translating" the Nerve theorem into loss functions.
ShapeDiscover Algorithm Implementation
- Function: Instantiates the theoretical framework into a complete cover learning algorithm
- Mechanism: Initializes with spectral clustering \(\to\) maps parameter matrix \(\theta \in \mathbb{R}^{n \times k}\) to a fuzzy cover via \(\pi_p \circ \text{softmax}\) \(\to\) trains for 500 epochs using the Adam optimizer with topological big steps \(\to\) obtains the final cover with the default threshold \(\lambda = 0.5\). The only parameters are the cover size n_cov and the number of neighbors n_neigh=15.
- Design Motivation: Spectral clustering initialization provides a geometrically meaningful starting point, vastly accelerating convergence and improving robustness. Applying an approximation with \(p=5\) instead of \(p=\infty\) avoids non-differentiable points. Parameters are minimal (only n_cov needs to be chosen by the user), which is far superior to Mapper's three coupled hyperparameters.

Loss & Training¶

The total loss is a weighted sum of four terms: \(\mathcal{L}(\theta) = \hat{\mathsf{M}} + \text{reg} \cdot \hat{\mathsf{G}} + \hat{\mathsf{T}}_0 + \text{reg} \cdot \hat{\mathsf{R}}\), where reg=10. \(\hat{\mathsf{R}}\) is an \(L^2\) regularization term, which guarantees the existence of solutions and combats the curse of dimensionality. The topology term is optimized via the topological big steps method of Nigmetov & Morozov (2024), resolving the slow convergence caused by sparse gradients in traditional topology. The model is trained for 500 epochs using the Adam optimizer with a learning rate of 0.1.

Key Experimental Results¶

Main Results¶

On four datasets—synthetic 2-sphere, 3-sphere, human body surface (\(\approx 2\)-sphere), and dynamical system video (\(\approx 2\)-torus)—the minimum number of vertices and total number of simplices required to achieve a given homology recovery ratio are compared.

Dataset	Method	Min Vertices	Total Simplices	Homology Recovery Ratio
2-sphere	ShapeDiscover	5	Min.	High
2-sphere	Subsample+VR	15+	Much more	Equal
2-sphere	Witness(v=1)	10	Medium	Medium
Neuroscience Torus	ShapeDiscover	15	~15-vertex complex	0.2
Neuroscience Torus	Subsample+VR (5000 pts)	5000	Extremely large	0 (Failed)

Ablation Study¶

Configuration	Homology Recovery Ratio	Explanation
Full (Default Params)	Highest	Complete model
w/o Measure Loss \(\hat{\mathsf{M}}\)	Significantly dropped	Measure term is indispensable
w/o Regularization \(\hat{\mathsf{R}}\)	Significantly dropped	Regularization is indispensable
w/o Clustering Initialization	Significantly dropped on real data	Initialization is crucial for real data
w/o Geometry Loss \(\hat{\mathsf{G}}\)	Basically unchanged	Geometry term has little impact (covered by regularization)
w/o Topology Loss \(\hat{\mathsf{T}}\)	Dropped during random init	Topology term is crucial during random initialization

Key Findings¶

ShapeDiscover recovers correct topology with very few vertices: On neuroscience data, it recovers the torus topology using only 15 vertices, whereas the VR complex fails even with 5,000 points (homology recovery ratio = 0).
Measure loss and regularization are core components: Removing either leads to a sharp decline in quality, indicating that the constraints "cover elements must not be too large" and "covers must be smooth" are critical.
Geometry loss can be replaced by regularization: Theoretically, \(\hat{\mathsf{G}} \leq \hat{\mathsf{R}}\) (via Jensen's inequality), and experiments also verify that the geometry loss is negligible.

Highlights & Insights¶

Translating the Nerve theorem into optimizable loss functions: This is the most central theoretical contribution of this paper. The Nerve theorem states that "the Nerve of a good cover recovers the correct topology," but no prior work has systematically encoded the conditions of a "good cover" into differentiable losses. Theorem 3.2 does precisely this, opening up a new path for topologically constrained optimization.
Fuzzy covers as a discrete-continuous bridge: Covers are discrete objects, but fuzzy covers are linked to real covers via superlevel set thresholding, allowing continuous optimization results to naturally translate into discrete outputs. This paradigm of "continuous relaxation + thresholding" is highly elegant and transferable to other combinatorial optimization problems.
Cover Learning under a unified view: For the first time, methods such as Mapper and Ball Mapper are unified as special cases of cover learning, and the clustering problem is pointed out to be a zero-dimensional special case of cover learning. This conceptual unification provides a clear framework for understanding and improving TDA methods.

Limitations & Future Work¶

Spurious intersection problem: The learned cover may contain spurious intersections between sets, leading to spurious high-dimensional simplices in the Nerve, which affects visualization and topological recovery. The authors mention in their conclusion the need to develop "robust cover learning."
Lack of topological recovery guarantees: Standard topological inference methods (like the VR complex) have theoretical consistency guarantees, whereas optimization-based methods do not. Although the Nerve theorem provides a verifiable sufficient condition, there is no guarantee that optimization will find a solution that satisfies the condition.
Efficiency bottleneck of topological optimization: Persistent homology optimization requires accessing the entire dataset in each epoch, preventing the use of mini-batches, which is the main bottleneck for computational time. Finding mini-batch-friendly topological regularization methods is an important open problem.
Exclusive use of 0-dimensional topology loss: In experiments, \(\hat{\mathsf{T}}_0\) only considers connected components (0-dimensional homology) without optimizing for higher-dimensional topological constraints. Higher-dimensional persistent homology is more expensive computationally but might be necessary for more complex topological structures.

vs 1D Mapper (Singh et al., 2007): Mapper requires selecting three highly coupled hyperparameters—filter function, interval cover, and clustering algorithm—and its output is limited to a one-dimensional graph. ShapeDiscover bypasses filter functions, requires fewer parameters (only n_cov), and can learn high-dimensional simplicial complexes.
vs Ball Mapper (Dłotko, 2019): Ball Mapper relies on a global distance scale \(\varepsilon\), fails to adapt to density variations, and suffers from the curse of dimensionality. The authors prove that Ball Mapper is equivalent to a Witness complex with parameter \(v=0\). ShapeDiscover automatically adapts to local geometry through optimization.
vs Differentiable Mapper (Oulhaj et al., 2024): The most closely related work, which also uses optimization to search for filter functions, but still follows the Mapper pipeline (relying on filter functions) and yields a 1D output. ShapeDiscover completely bypasses filter functions to directly learn the cover.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First to propose cover learning as an independent optimization problem; translating the Nerve theorem into loss functions is a highly original theoretical contribution.
Experimental Thoroughness: ⭐⭐⭐⭐ Includes both synthetic and real data with a complete ablation analysis, but lacks quantitative evaluation on large-scale high-dimensional data.
Writing Quality: ⭐⭐⭐⭐⭐ Rigorous and clear theoretical derivations, with a complete logical chain from abstract goals to actual algorithms, supplemented by highly detailed appendices.
Value: ⭐⭐⭐⭐ Provides a new paradigm for TDA and unifies various methods, but has a relatively narrow audience (primarily the computational topology and TDA communities).