Skip to content

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors

Conference: CVPR 2026 arXiv: 2603.13092 Code: None Area: EDA / Circuit Yield Analysis Keywords: yield analysis, foundation model, in-context learning, TabPFN, SRAM, zero hyperparameter

TL;DR

This paper proposes replacing handcrafted priors (GP kernels, IS Gaussian assumptions) with the learned prior of the foundation model TabPFN, enabling zero-hyperparameter multi-PVT-corner yield analysis. On industrial-grade SRAM benchmarks, the method achieves state-of-the-art accuracy (MRE as low as 0.11%) while reducing verification cost by more than 10×.

Background & Motivation

Background: Modern integrated circuits must be validated across 25+ Process-Voltage-Temperature (PVT) corners, each requiring more than \(10^4\) Monte Carlo simulations. The total cost of \(O(K \times N)\) leads to weeks of computation time.

Limitations of Prior Work: Acceleration methods have advanced along two tracks, both hitting fundamental walls: (1) Importance Sampling (IS) methods such as MNIS achieve full automation but are constrained by a "model capacity barrier"—their Gaussian assumptions cannot capture nonlinear failure regions, imposing a hard accuracy ceiling; (2) Surrogate model methods such as GP, deep kernels, and normalizing flows overcome this capacity limitation but introduce a "tuning barrier"—each circuit requires hours of hyperparameter optimization (kernel selection, architecture search), and ±20% hyperparameter perturbation causes error to swing from 19% to 111%, which is unacceptable in industrial practice.

Key Challenge: A fundamental tension between expressiveness and automation—simple models enable automation but limit accuracy, while complex models improve accuracy but demand extensive tuning.

Goal: Eliminate per-circuit hyperparameter tuning entirely while retaining high model expressiveness for nonlinear failure boundary modeling.

Key Insight: Replace engineered priors with meta-learned priors. After pre-training on millions of regression tasks, TabPFN adapts to new circuits via in-context learning (a single forward pass)—no gradient descent, no hyperparameter optimization, no retraining required.

Core Idea: TabPFN's learned prior + joint multi-corner modeling + active learning = zero-tuning, high-accuracy multi-corner yield analysis.

Method

Overall Architecture

The pipeline consists of two stages: (1) sparse feature selection—compressing high-dimensional circuit parameters (e.g., 1152D for a 32×2 SRAM) to approximately 48D; (2) a zero-hyperparameter inference loop—TabPFN performs in-context learning to build a global surrogate, uncertainty-driven active learning guides SPICE simulations, and the process iterates until yield estimates converge.

Key Designs

  1. From Engineered Priors to Learned Priors (TabPFN):

    • Function: Performs Bayesian posterior prediction in a single forward pass with no hyperparameter optimization.
    • Mechanism: Traditional GPs require per-circuit optimization of kernel hyperparameters \(\theta^* = \arg\max_\theta \log \mathcal{N}(\mathbf{y}|\mathbf{0}, K_\theta + \sigma^2 I)\) (non-convex optimization over \(O(D)\) parameters). TabPFN is pre-trained via the meta-learning objective \(\Theta^* = \arg\min_\Theta \mathbb{E}_{f \sim p_{\text{meta}}} [\mathbb{E}_{D_{\text{train}} \sim f} [\mathbb{E}_{(\mathbf{z}^*, y^*) \sim f} [-\log p_\Theta(y^*|\mathbf{z}^*, D_{\text{train}})]]]\), which is equivalent to minimizing the KL divergence between the learned approximation and the true posterior predictive distribution. At inference, \((\mu^*, (\sigma^*)^2) = \mathcal{G}_{\Theta^*}(\mathbf{z}^*, D_{\text{circuit}})\), where the self-attention mechanism acts as a learned kernel \(k_{\text{learned}}(\mathbf{z}^*, \mathbf{z}_i; D) \propto \exp(\mathbf{Q}(\mathbf{z}^*)^T \mathbf{K}(\mathbf{z}_i) / \sqrt{d_k})\).
    • Design Motivation: Amortize per-circuit hyperparameter tuning cost into a one-time large-scale pre-training, thereby eliminating the tuning barrier.
  2. Cross-Corner Knowledge Transfer:

    • Function: Exploit physical correlations among PVT corners to improve prediction accuracy for sparsely sampled corners.
    • Mechanism: Sparse process parameters \(\mathbf{x}_\mathcal{S}\) are concatenated with corner encodings \(c\) (normalized voltage and temperature) to form joint inputs \(\mathbf{z} = [\mathbf{x}_\mathcal{S}; c] \in \mathbb{R}^{|\mathcal{S}|+p}\), constructing a global surrogate \(\hat{f}(\mathbf{x}_\mathcal{S}, c)\). Attention weights \(\alpha_{ij} = \text{softmax}(\mathbf{Q}(\mathbf{z}^*)^T \mathbf{K}(\mathbf{z}_i) / \sqrt{d_k})\) automatically up-weight training samples from corners correlated with the query corner. The effective sample size \(n_{\text{eff}}(\mathbf{x}^*, c_2) = \sum_i \alpha_i^2 \geq n_2\) can substantially exceed the number of samples from the target corner alone.
    • Design Motivation: Modeling each corner independently requires \(K\) separate models and discards shared physical information. Joint modeling allows well-sampled corners to "lend" information to sparse ones; ablations show up to 72% error reduction.
  3. Uncertainty-Guided Active Learning:

    • Function: Concentrate expensive SPICE simulations in regions of highest information gain for yield estimation.
    • Mechanism: The acquisition function combines predictive uncertainty and proximity to specification boundaries: \(\alpha_k(\mathbf{x}) = \sigma(\mathbf{x}, c_k) \cdot \phi((\hat{f}(\mathbf{x}, c_k) - \text{Spec}_k) / \sigma(\mathbf{x}, c_k))\). Here \(\sigma\) captures epistemic uncertainty (reducible with more data), and \(\phi(\cdot)\) concentrates sampling near the pass/fail decision boundary. Multi-corner joint optimization is \(\alpha(\mathbf{x}) = \max_k \alpha_k(\mathbf{x})\), with a diversity penalty applied during batch sampling.
    • Design Motivation: The Bayesian nature of TabPFN provides calibrated uncertainty estimates "for free," directly usable for active learning at no additional cost.

Feature Selection / Dimensionality Reduction

TabPFN is currently limited to inputs of at most 500 dimensions. For the 1152D 32×2 SRAM, a GBDT (default LightGBM configuration, zero tuning) is used to obtain feature importance rankings, followed by greedy search for the optimal subset \(\mathcal{S}^* = \arg\max_k R^2(\mathcal{S}_k)\), typically reducing 1152D to 48D in sub-minute time.

Key Experimental Results

Main Results

Multi-corner yield prediction (5 PVT corners, MRE %, OpenYield industrial SRAM benchmark):

Circuit BI-BD BI-BC OPT Ours
4×2 (144D) 0.15 0.45 0.47 0.11
8×2 (288D) 0.29 2.46 20.4 0.22
16×2 (576D) 3.39 0.56 30.3 0.29
32×2 (1152D) 0.79 1.64 12.2 1.10

Single-corner analysis (8×2 SRAM, FF corner):

Method MRE (%) #Sim Speedup
MC (baseline) 4100
MNIS 12.63 3100 1.3×
ACS 28.47 2700 1.5×
HSCS 19.01 3360 1.2×
OPT 8.88 3000 1.4×
Ours 8.88 170 24.1×

Ablation Study

Cross-corner knowledge transfer ablation (16×2 SRAM):

Corner Target Only +1 Corner +2 +3 +4 MRE Reduction
TT 21.79 13.85 10.86 9.05 6.04 −72%
SF 100.00 100.00 71.43 57.14 42.86 −57%
FS 2.21 2.00 0.88 0.87 0.71 −68%
SS 4.09 4.09 3.79 3.62 3.61 −12%

Surrogate model comparison (8×2 SRAM, fewer than 1000 samples):

Method ~100-sample MAE Tuning Required
GP (tuned) ~30% Yes
Deep-GP (tuned) ~35% Yes
MLP (tuned) ~45% Yes
SVM (tuned) ~40% Yes
TabPFN ~5% None

Key Findings

  • At 100 samples, TabPFN achieves ~5% MAE while all tuned baselines range from 30–45%.
  • Cross-corner knowledge transfer is especially effective on difficult corners (TT, SF, FS), reducing error by up to 72%.
  • Single-corner analysis achieves 24.1× speedup; at matched accuracy, the proposed method requires 17× fewer simulations than OPT.
  • Traditional IS methods achieve only 1.2–1.5× speedup on industrial SRAM with peripheral circuits and parasitics, whereas learned priors are substantially more robust.
  • Hyperparameter sensitivity experiments (Table 1) reveal that SOTA methods exhibit 6–10× error fluctuation under ±20% perturbation, quantifying the severity of the tuning barrier.

Highlights & Insights

  • The formalization and quantification of the "tuning barrier" (Table 1) is highly compelling and directly addresses an industrial pain point.
  • The theoretical connection between self-attention and learned kernels is clearly established, with a natural transition from GP kernels to attention weights.
  • The ablation study on cross-corner information sharing (Table 6) intuitively demonstrates the value of joint modeling.
  • The entire pipeline is end-to-end zero-tuning, making it well-suited for industrial integration.

Limitations & Future Work

  • TabPFN is currently limited to 500 dimensions; feature selection serves as a workaround for higher-dimensional circuits.
  • Evaluation is conducted only on SRAM circuits; performance on analog or mixed-signal designs has not been tested.
  • Only 5 corners (TT/FF/SS/FS/SF) are considered, whereas industrial practice commonly requires 25+ corners.
  • Although feature selection requires no tuning, it implicitly assumes that GBDT-based feature importance rankings are reliable, which may not hold for all circuit types.
  • MNIS is the industry-standard IS method; this paper preserves its automation advantage while breaking through its capacity limitation.
  • TabPFN is a foundation model for tabular data, here applied to EDA for the first time.
  • The learned-prior paradigm is generalizable to other engineering simulation domains that require repeated surrogate modeling, such as electromagnetic simulation and thermal analysis.

Rating

  • Novelty: ⭐⭐⭐⭐ Applying in-context learning from foundation models to EDA yield analysis is a genuinely novel cross-domain contribution.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Multi-scale SRAM benchmarks, single- and multi-corner evaluations, surrogate model comparisons, and ablation studies are all comprehensive.
  • Writing Quality: ⭐⭐⭐⭐ The "dual-barrier" narrative is clear and coherent; tables and figures are persuasive.
  • Value: ⭐⭐⭐⭐ The zero-tuning property has direct deployment value for industrial EDA, with strong potential to enable practical adoption of AI-driven yield analysis.