CVPR 2026 Self-Supervised Learning yield analysis foundation model in-context learning TabPFN SRAM zero hyperparameter

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors¶

Conference: CVPR 2026 arXiv: 2603.13092 Code: None Area: EDA / Circuit Yield Analysis Keywords: yield analysis, foundation model, in-context learning, TabPFN, SRAM, zero hyperparameter

TL;DR¶

This paper proposes replacing handcrafted priors (GP kernels, IS Gaussian assumptions) with the learned prior of the foundation model TabPFN, enabling zero-hyperparameter multi-PVT-corner yield analysis. On industrial-grade SRAM benchmarks, the method achieves state-of-the-art accuracy (MRE as low as 0.11%) while reducing verification cost by more than 10×.

Background & Motivation¶

Background: Modern integrated circuits must be validated across 25+ Process-Voltage-Temperature (PVT) corners, each requiring more than \(10^4\) Monte Carlo simulations. The total cost of \(O(K \times N)\) leads to weeks of computation time.

Limitations of Prior Work: Acceleration methods have advanced along two tracks, both hitting fundamental walls: (1) Importance Sampling (IS) methods such as MNIS achieve full automation but are constrained by a "model capacity barrier"—their Gaussian assumptions cannot capture nonlinear failure regions, imposing a hard accuracy ceiling; (2) Surrogate model methods such as GP, deep kernels, and normalizing flows overcome this capacity limitation but introduce a "tuning barrier"—each circuit requires hours of hyperparameter optimization (kernel selection, architecture search), and ±20% hyperparameter perturbation causes error to swing from 19% to 111%, which is unacceptable in industrial practice.

Key Challenge: A fundamental tension between expressiveness and automation—simple models enable automation but limit accuracy, while complex models improve accuracy but demand extensive tuning.

Goal: Eliminate per-circuit hyperparameter tuning entirely while retaining high model expressiveness for nonlinear failure boundary modeling.

Key Insight: Replace engineered priors with meta-learned priors. After pre-training on millions of regression tasks, TabPFN adapts to new circuits via in-context learning (a single forward pass)—no gradient descent, no hyperparameter optimization, no retraining required.

Core Idea: TabPFN's learned prior + joint multi-corner modeling + active learning = zero-tuning, high-accuracy multi-corner yield analysis.

Method¶

Overall Architecture¶

The pipeline consists of two stages: (1) sparse feature selection—compressing high-dimensional circuit parameters (e.g., 1152D for a 32×2 SRAM) to approximately 48D; (2) a zero-hyperparameter inference loop—TabPFN performs in-context learning to build a global surrogate, uncertainty-driven active learning guides SPICE simulations, and the process iterates until yield estimates converge.

Key Designs¶

From Engineered Priors to Learned Priors (TabPFN):
- Function: Performs Bayesian posterior prediction in a single forward pass with no hyperparameter optimization.
- Mechanism: Traditional GPs require per-circuit optimization of kernel hyperparameters \(\theta^* = \arg\max_\theta \log \mathcal{N}(\mathbf{y}|\mathbf{0}, K_\theta + \sigma^2 I)\) (non-convex optimization over \(O(D)\) parameters). TabPFN is pre-trained via the meta-learning objective \(\Theta^* = \arg\min_\Theta \mathbb{E}_{f \sim p_{\text{meta}}} [\mathbb{E}_{D_{\text{train}} \sim f} [\mathbb{E}_{(\mathbf{z}^*, y^*) \sim f} [-\log p_\Theta(y^*|\mathbf{z}^*, D_{\text{train}})]]]\), which is equivalent to minimizing the KL divergence between the learned approximation and the true posterior predictive distribution. At inference, \((\mu^*, (\sigma^*)^2) = \mathcal{G}_{\Theta^*}(\mathbf{z}^*, D_{\text{circuit}})\), where the self-attention mechanism acts as a learned kernel \(k_{\text{learned}}(\mathbf{z}^*, \mathbf{z}_i; D) \propto \exp(\mathbf{Q}(\mathbf{z}^*)^T \mathbf{K}(\mathbf{z}_i) / \sqrt{d_k})\).
- Design Motivation: Amortize per-circuit hyperparameter tuning cost into a one-time large-scale pre-training, thereby eliminating the tuning barrier.
Cross-Corner Knowledge Transfer:
- Function: Exploit physical correlations among PVT corners to improve prediction accuracy for sparsely sampled corners.
- Mechanism: Sparse process parameters \(\mathbf{x}_\mathcal{S}\) are concatenated with corner encodings \(c\) (normalized voltage and temperature) to form joint inputs \(\mathbf{z} = [\mathbf{x}_\mathcal{S}; c] \in \mathbb{R}^{|\mathcal{S}|+p}\), constructing a global surrogate \(\hat{f}(\mathbf{x}_\mathcal{S}, c)\). Attention weights \(\alpha_{ij} = \text{softmax}(\mathbf{Q}(\mathbf{z}^*)^T \mathbf{K}(\mathbf{z}_i) / \sqrt{d_k})\) automatically up-weight training samples from corners correlated with the query corner. The effective sample size \(n_{\text{eff}}(\mathbf{x}^*, c_2) = \sum_i \alpha_i^2 \geq n_2\) can substantially exceed the number of samples from the target corner alone.
- Design Motivation: Modeling each corner independently requires \(K\) separate models and discards shared physical information. Joint modeling allows well-sampled corners to "lend" information to sparse ones; ablations show up to 72% error reduction.
Uncertainty-Guided Active Learning:
- Function: Concentrate expensive SPICE simulations in regions of highest information gain for yield estimation.
- Mechanism: The acquisition function combines predictive uncertainty and proximity to specification boundaries: \(\alpha_k(\mathbf{x}) = \sigma(\mathbf{x}, c_k) \cdot \phi((\hat{f}(\mathbf{x}, c_k) - \text{Spec}_k) / \sigma(\mathbf{x}, c_k))\). Here \(\sigma\) captures epistemic uncertainty (reducible with more data), and \(\phi(\cdot)\) concentrates sampling near the pass/fail decision boundary. Multi-corner joint optimization is \(\alpha(\mathbf{x}) = \max_k \alpha_k(\mathbf{x})\), with a diversity penalty applied during batch sampling.
- Design Motivation: The Bayesian nature of TabPFN provides calibrated uncertainty estimates "for free," directly usable for active learning at no additional cost.

Feature Selection / Dimensionality Reduction¶

TabPFN is currently limited to inputs of at most 500 dimensions. For the 1152D 32×2 SRAM, a GBDT (default LightGBM configuration, zero tuning) is used to obtain feature importance rankings, followed by greedy search for the optimal subset \(\mathcal{S}^* = \arg\max_k R^2(\mathcal{S}_k)\), typically reducing 1152D to 48D in sub-minute time.

Key Experimental Results¶

Main Results¶

Multi-corner yield prediction (5 PVT corners, MRE %, OpenYield industrial SRAM benchmark):

Circuit	BI-BD	BI-BC	OPT	Ours
4×2 (144D)	0.15	0.45	0.47	0.11
8×2 (288D)	0.29	2.46	20.4	0.22
16×2 (576D)	3.39	0.56	30.3	0.29
32×2 (1152D)	0.79	1.64	12.2	1.10

Single-corner analysis (8×2 SRAM, FF corner):

Method	MRE (%)	#Sim	Speedup
MC (baseline)	—	4100	—
MNIS	12.63	3100	1.3×
ACS	28.47	2700	1.5×
HSCS	19.01	3360	1.2×
OPT	8.88	3000	1.4×
Ours	8.88	170	24.1×

Ablation Study¶

Cross-corner knowledge transfer ablation (16×2 SRAM):

Corner	Target Only	+1 Corner	+2	+3	+4	MRE Reduction
TT	21.79	13.85	10.86	9.05	6.04	−72%
SF	100.00	100.00	71.43	57.14	42.86	−57%
FS	2.21	2.00	0.88	0.87	0.71	−68%
SS	4.09	4.09	3.79	3.62	3.61	−12%

Surrogate model comparison (8×2 SRAM, fewer than 1000 samples):

Method	~100-sample MAE	Tuning Required
GP (tuned)	~30%	Yes
Deep-GP (tuned)	~35%	Yes
MLP (tuned)	~45%	Yes
SVM (tuned)	~40%	Yes
TabPFN	~5%	None

Key Findings¶

At 100 samples, TabPFN achieves ~5% MAE while all tuned baselines range from 30–45%.
Cross-corner knowledge transfer is especially effective on difficult corners (TT, SF, FS), reducing error by up to 72%.
Single-corner analysis achieves 24.1× speedup; at matched accuracy, the proposed method requires 17× fewer simulations than OPT.
Traditional IS methods achieve only 1.2–1.5× speedup on industrial SRAM with peripheral circuits and parasitics, whereas learned priors are substantially more robust.
Hyperparameter sensitivity experiments (Table 1) reveal that SOTA methods exhibit 6–10× error fluctuation under ±20% perturbation, quantifying the severity of the tuning barrier.

Highlights & Insights¶

The formalization and quantification of the "tuning barrier" (Table 1) is highly compelling and directly addresses an industrial pain point.
The theoretical connection between self-attention and learned kernels is clearly established, with a natural transition from GP kernels to attention weights.
The ablation study on cross-corner information sharing (Table 6) intuitively demonstrates the value of joint modeling.
The entire pipeline is end-to-end zero-tuning, making it well-suited for industrial integration.

Limitations & Future Work¶

TabPFN is currently limited to 500 dimensions; feature selection serves as a workaround for higher-dimensional circuits.
Evaluation is conducted only on SRAM circuits; performance on analog or mixed-signal designs has not been tested.
Only 5 corners (TT/FF/SS/FS/SF) are considered, whereas industrial practice commonly requires 25+ corners.
Although feature selection requires no tuning, it implicitly assumes that GBDT-based feature importance rankings are reliable, which may not hold for all circuit types.

MNIS is the industry-standard IS method; this paper preserves its automation advantage while breaking through its capacity limitation.
TabPFN is a foundation model for tabular data, here applied to EDA for the first time.
The learned-prior paradigm is generalizable to other engineering simulation domains that require repeated surrogate modeling, such as electromagnetic simulation and thermal analysis.

Rating¶

Novelty: ⭐⭐⭐⭐ Applying in-context learning from foundation models to EDA yield analysis is a genuinely novel cross-domain contribution.
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-scale SRAM benchmarks, single- and multi-corner evaluations, surrogate model comparisons, and ablation studies are all comprehensive.
Writing Quality: ⭐⭐⭐⭐ The "dual-barrier" narrative is clear and coherent; tables and figures are persuasive.
Value: ⭐⭐⭐⭐ The zero-tuning property has direct deployment value for industrial EDA, with strong potential to enable practical adoption of AI-driven yield analysis.