Parameterized Hardness of Zonotope Containment and Neural Network Verification¶

Conference: ICLR 2026
OpenReview: https://openreview.net/forum?id=y8N45EEW05
Code: None (Theoretical paper)
Area: Learning Theory / Computational Complexity / Neural Network Verification
Keywords: Parameterized Complexity, W[1]-hard, ReLU Network Verification, Zonotope Containment, Lipschitz Constant, Exponential Time Hypothesis

TL;DR¶

This paper proves that a series of verification-related problems, including the positivity/surjectivity of 2-layer ReLU networks, zonotope non-containment, and Lipschitz constant computation, are all W[1]-hard when the input dimension $d$ is taken as the parameter. This rules out fixed-parameter tractability under the Exponential Time Hypothesis (ETH) and demonstrates that naive "linear region enumeration" algorithms are essentially optimal in their dependence on $d$.

Background & Motivation¶

Background: Neural networks with ReLU activations are increasingly used in safety-critical scenarios (autonomous driving, control). Consequently, "verification"—determining whether the network output necessarily falls within a target set $Y$ given an input set $X$—has become a core task. Several classic problems (network verification, Lipschitz constant estimation, zonotope containment) have long been proven to be (co)NP-hard.

Limitations of Prior Work: NP-hardness only indicates that problems are difficult in the worst case. It does not answer a more refined question: since input data in practice is often assumed to lie on a low-dimensional manifold, do these problems become solvable (i.e., Fixed-Parameter Tractable, FPT) when the input dimension $d$ is small? This is the focus of parameterized complexity. It is known that checking "injectivity" for 2-layer ReLU networks can be done in $(d+1)^d \cdot n^{O(1)}$ time (FPT w.r.t. $d$), which gave hope that low-dimensional cases might be tractable.

Key Challenge: However, the parameterized complexity of network "positivity/surjectivity" and Lipschitz constants was left as an open problem by Froese et al. at COLT '25—whether they are FPT like injectivity or essentially explode exponentially with $d$ remained unknown. If the latter is true, the intuition that the "low-dimensional hypothesis can save the day" would be invalid.

Goal: To clarify the parameterized complexity of this family of problems with respect to the parameter $d$, answer the open questions by Froese et al., and provide lower bounds on running time under ETH.

Key Insight: All these problems can be solved in $n^{O(d)} \cdot \mathrm{poly}(N)$ time using "brute-force" methods (e.g., enumerating zonotope vertices or network linear regions), placing them in the complexity class XP. The authors' task is to provide a reduction where the input dimension grows linearly with the clique size, thereby proving they are W[1]-hard.

Core Idea: Construct a 2-layer ReLU network from a MULTICOLORED CLIQUE instance such that the input dimension equals the number of colors $k$ (plus 1), making the existence of a $k$-clique equivalent to the network achieving a positive output. Then, using "homogenization" to remove biases and the "ReLU network $\leftrightarrow$ zonotope" duality, propagate this hardness result across the entire family of problems.

Method¶

Overall Architecture¶

This is a pure computational complexity paper. Rather than a "method pipeline," its "method" is a parameterized reduction chain: starting from the W[1]-hard MULTICOLORED CLIQUE, a single reduction path connects the entire family of verification problems. The overall logic is:

graph LR
    A[MULTICOLORED CLIQUE] --> B[2-Layer ReLU Positivity]
    B --> C[Homogenization: Unbiased Positivity]
    C --> D[Duality: Zonotope Non-containment]
    C --> E[Lipschitz Constant Hardness]
    D --> F[Hardness of Maximum/Approximation]

The input is a MULTICOLORED CLIQUE instance (graph $G=(V=V_1 \dot\cup \cdots \dot\cup V_k, E)$). The output is a 2-layer ReLU network $f:\mathbb{R}^{k} \to \mathbb{R}$ whose maximum value satisfies:

\[\max_{x \in \mathbb{R}^k} f(x) = k + \binom{k}{2} \iff G \text{ contains a } k\text{-clique},\]

otherwise $\max f \le k + \binom{k}{2} - 1$. Since the input dimension $d = k$ (or $k+1$ after homogenization) grows only linearly with the clique size, the W[1]-hardness and ETH lower bounds of CLIQUE are transferred to the network problems.

Key Designs¶

1. Dimension-Controlled MULTICOLORED CLIQUE Reduction: Encoding Cliques via Spike and Penalty Functions

Standard NP-hardness reductions allow the input dimension to grow indefinitely. To prove W[1]-hardness, the parameter of the reduced instance (here $d$) must be bounded by a function of the source parameter (clique size $k$). Each input coordinate $x_c$ is assigned to a color $c$, making $d=k$. Unique labels $\omega_{c,i} \in \mathbb{N}$ are assigned to nodes, and Sidon sets ensure edge labels $\omega_{r,i,l,j} := \omega_{r,i} + \omega_{l,i}$ are unique. Two sets of piecewise linear functions are constructed: spike functions $s_{r,l}$ for edges (peaking at edge labels) and penalty functions $p_c$ for nodes (peaking at node labels). Summing them creates a function where $f$ approaches the upper bound only if all penalties and spikes are positive simultaneously, which corresponds exactly to a $k$-clique.

2. Homogenization: Converting Biased Hardness to Unbiased Networks

The initial reduction relies on biases. However, downstream results (especially zonotope duality) only hold for positive-homogeneous, unbiased networks. The authors introduce homogenization (Definition 4.1): an extra input $y$ is added, biases $b$ in the first layer are replaced by $y \cdot b$, and output biases are replaced by $|y| \cdot b$. This yields an unbiased network that recovers the original function at the $y=1$ slice. This proves that "surjectivity" of 2-layer ReLU networks is also W[1]-hard, as surjectivity for positive-homogeneous functions is equivalent to the existence of a positive output point (given the existence of negative points).

3. ReLU Network and Zonotope Duality: Transferring Hardness to Zonotope Non-containment

There is a duality between positive-homogeneous convex CPWL functions $\mathbb{R}^d \to \mathbb{R}$ and polytopes in $\mathbb{R}^d$. An unbiased 2-layer network $g(x)$ can be decomposed into two zonotopes $Z_+$ and $Z_-$. The condition $Z_+ \subseteq Z_-$ holds if and only if $g \le 0$ everywhere. Thus, 2-layer ReLU positivity $\iff$ zonotope non-containment. This proves that ZONOTOPE NON-CONTAINMENT is W[1]-hard w.r.t. $d$, implying that $O(n^{d-1} \cdot \mathrm{poly}(N))$ algorithms are essentially optimal.

4. Lipschitz Hardness and Positive Results: Scaling and ICNNs

The authors prove $L_p(f) = \max_{\|x\|_p \le 1} |f(x)|$ for positive-homogeneous functions. By scaling coordinates, the maximum value hardness is transferred to the Lipschitz constant. Thus, 2-layer ReLU $L_p$-Lipschitz computation is both NP-hard and W[1]-hard for all $p \in (0, \infty]$. For Input Convex Neural Networks (ICNNs), the authors provide a "positive" exit: the $L_1$-Lipschitz constant is computable in polynomial time, and the $L_\infty$-Lipschitz constant is FPT in $O(2^d \, \mathrm{poly}(N))$. They also provide a randomized FPT-approximation for zonotope norm-maximization using subspace embeddings.

Key Experimental Results¶

Main Results¶

Problem (Parameter $d = $ input dim)	Conclusion (Ours)	Brute-force Complexity	ETH Lower Bound
2-layer ReLU Positivity (=Surjectivity)	W[1]-hard (even unbiased)	$O(n^{d} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$
Approx. 2-layer ReLU Max (any factor)	W[1]-hard	$O(n^{d} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$
Zero-check for 3-layer ReLU	W[1]-hard	$O(n^{2d} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$
Zonotope Non-Containment	W[1]-hard	$O(n^{d-1} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$
2-layer $L_p$-Lipschitz, $p \in (0, \infty]$	NP-hard + W[1]-hard	$O(n^{d} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$
Approx. 3-layer $L_p$-Lipschitz	W[1]-hard	$O(n^{2d} \cdot \mathrm{poly}(N))$	No $\rho(d)N^{o(d)}$

Key Findings¶

Linear Dimension Control is Essential: The proof requires $d$ to be strictly linear w.r.t. $k$ to establish W[1]-hardness rather than just NP-hardness.
Homogenization is the Key Toggle: Removing biases is a technical requirement for zonotope duality and Lipschitz scaling; without it, hardness cannot be propagated.
Hardness Persists with Sparse Weights: The construction uses only a constant number of neurons with polynomially bounded weights, warning that weight sparsity alone cannot bypass the hardness.
2-layer vs. 3-layer Gap: Checking if a network is identically zero is polynomial for 2 layers but W[1]-hard for 3 layers.

Highlights & Insights¶

The spike/penalty construction allows a perfect geometric match between the combinatorial goal (clique) and the network's maximum value.
Homogenization serves as a clever tool to lift affine constructions into the homogeneous/dual world for free.
The duality links machine learning verification, computational geometry, and robotics under a single W[1]-hardness result.
The tractability boundaries provide practical guidance: since general cases are hard, one must use restricted architectures like ICNNs or heuristics that limit the number of linear regions.

Limitations & Future Work¶

The results are worst-case: practical networks might not trigger these instances, and the constants in the exponent might be improvable.
Solvability relies on strong structural assumptions (e.g., ICNNs); it is unclear which natural architectures satisfy these.
Open questions remain: whether zonotope norm-maximization for $p \in (1, \infty)$ is FPT, the complexity of $L_0$ for 2-layer ReLU, and whether 2-layer positivity is in W[1].

vs. Froese et al. (COLT '25): They established NP-hardness for positivity and an FPT algorithm for injectivity; this paper answers their open question by proving positivity is W[1]-hard.
vs. Virmaux & Scaman (2018): They proved NP-hardness for 2-layer $L_2$-Lipschitz; this paper generalizes this to all $p$ and strengthens it to W[1]-hardness.
vs. Shenmaier (2018): This paper generalizes zonotope norm-maximization results and proves W[1]-hardness for general cases.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (Solves COLT '25 open problems and provides the strongest hardness results in the field.)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Pure theory paper; the reduction chain is complete and exhaustive.)
Writing Quality: ⭐⭐⭐⭐ (Clear structure, though highly dense and requiring complexity background.)
Value: ⭐⭐⭐⭐⭐ (Impacts verification, geometry, and robotics; provides theoretical justification for heuristics.)

Problem (Parameter $d = $ input dim)	Conclusion (Ours)	Brute-force Complexity	ETH Lower Bound
2-layer ReLU Positivity (=Surjectivity)	W[1]-hard (even unbiased)	\(O(n^{d} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)
Approx. 2-layer ReLU Max (any factor)	W[1]-hard	\(O(n^{d} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)
Zero-check for 3-layer ReLU	W[1]-hard	\(O(n^{2d} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)
Zonotope Non-Containment	W[1]-hard	\(O(n^{d-1} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)
2-layer \(L_p\)-Lipschitz, \(p \in (0, \infty]\)	NP-hard + W[1]-hard	\(O(n^{d} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)
Approx. 3-layer \(L_p\)-Lipschitz	W[1]-hard	\(O(n^{2d} \cdot \mathrm{poly}(N))\)	No \(\rho(d)N^{o(d)}\)