Quantifying the Role of OpenFold Components in Protein Structure Prediction¶
Conference: NeurIPS 2025 (Workshop)
arXiv: 2511.14781
Code: None (based on the OpenFold open-source implementation)
Area: Protein Structure Prediction / Interpretability
Keywords: OpenFold, AlphaFold2, Evoformer, Component Ablation, Protein Length
TL;DR¶
This paper proposes a systematic methodology for evaluating the contribution of individual Evoformer components in OpenFold/AlphaFold2 to protein structure prediction accuracy. The study finds that MSA column attention and MLP Transition layers are the most critical components, and that the importance of multiple components is significantly correlated with protein sequence length.
Background & Motivation¶
AlphaFold2 and OpenFold have revolutionized the field of protein structure prediction, yet their internal mechanisms remain poorly understood. The core architecture, Evoformer, comprises diverse components—attention layers, Transition MLPs, triangular update operations, among others—but the relative contribution of each component to prediction accuracy has not been established.
Existing ablation studies have primarily focused on auxiliary losses, training strategies, or coarse-grained architectural changes (e.g., "removing all triangular operations"), lacking systematic evaluation of individual Evoformer components. This paper addresses that gap by conducting per-component skip/zeroing experiments to reveal which components are universally critical, which are dispensable, and how importance varies with protein properties.
Given that subsequent models such as AlphaFold3 and Boltz retain the same Transformer-plus-triangular-operations architecture, the findings of this work carry broad applicability.
Method¶
Overall Architecture¶
OpenFold's protein structure prediction proceeds in three stages: 1. Preprocessing: Generates MSA (multiple sequence alignment) representations and Pair (residue-pair) representations. 2. Evoformer Processing: Iteratively refines both representations through 48 Evoformer blocks. 3. Structure Module: Outputs the 3D structure from the refined representations.
Each Evoformer block contains two pathways: - MSA pathway: MSA row attention → MSA column attention → MSA Transition (MLP) - Pair pathway: Outer product mean (connecting MSA→Pair) → Triangular multiplicative update → Triangular attention → Pair Transition (MLP)
Key Designs¶
Three categories of ablation experiments
- Skip attention modules: Bypass the operations of specific attention layers across all 48 Evoformer blocks.
- Skip non-attention modules / zero representations: Skip MLP layers or zero out the final representations before the structure module.
- Length correlation analysis: Fit a linear regression of ΔTM-score against protein sequence length and compute Spearman correlation.
Data filtering strategy
A three-month CAMEO subset is used (proteins with sequence length < 700). Targets with missing structure files or baseline TM-score < 0.7 are excluded, yielding a final set of 154 proteins.
Loss & Training¶
OpenFold model_1_ptm weights and original AlphaFold2 JAX weights are used. Zero recycles are applied, and unrelaxed structure predictions are evaluated. Each protein is run three times and results are averaged.
Key Experimental Results¶
Main Results¶
Attention component ablation (Figure 2a)
| Ablation | Median ΔTM | Impact |
|---|---|---|
| Skip MSA column attention | Largest deviation | Most critical |
| Skip MSA row attention | Minor impact | Small effect on most proteins |
| Skip triangular attention | Negligible impact | Ignorable for most proteins |
| Only MSA column attention | 0.089 | This single component preserves most performance |
| Only MSA row attention | Large drop | Insufficient alone for structure prediction |
| Only triangular attention | Large drop | Insufficient alone for structure prediction |
Non-attention component ablation (Figure 2b)
| Ablation | Median ΔTM | Impact |
|---|---|---|
| Skip Pair Transition | 0.765 | Highly critical |
| Skip MSA Transition | 0.829 | Most critical |
| Zero MSA representation | Minimal | Small effect on most proteins |
| Zero Pair representation | Large drop | Highly critical |
| Skip triangular multiplicative update | High variance | Protein-dependent |
Ablation Study¶
Correlation analysis between component importance and protein length (Table 1)
| Ablation | \(R^2\) | Spearman \(\rho\) | \(p\)-value | Trend |
|---|---|---|---|---|
| Skip MSA column attention | 0.13 | 0.40 | 1.9e-7 | Longer proteins more dependent |
| Only MSA column attention | 0.02 | -0.13 | 0.11 | No significant correlation |
| Skip MSA row attention | 0.01 | -0.07 | 0.42 | No significant correlation |
| Skip triangular attention | 0.02 | -0.19 | 0.018 | Shorter proteins more dependent |
| Skip MSA Transition | 0.09 | 0.34 | 1.2e-5 | Longer proteins more dependent |
| Zero MSA representation | 0.21 | 0.46 | 1.3e-9 | Longer proteins more dependent |
| Skip triangular multiplicative update | 0.06 | 0.08 | 0.31 | No significant correlation |
| Skip Pair Transition | 0.26 | 0.56 | 3.8e-14 | Longer proteins more dependent |
| Zero Pair representation | 0.11 | 0.38 | 1.1e-6 | Longer proteins more dependent |
Key Findings¶
- MSA column attention is the most critical attention component: Retaining it alone recovers the majority of baseline performance (median ΔTM of only 0.089), indicating that OpenFold heavily relies on evolutionary sequence information.
- MLP Transition layers are indispensable: Skipping MSA/Pair Transitions causes the largest performance drops (0.765–0.829), consistent with findings in the Transformer interpretability literature that MLP layers encode critical semantic information.
- Pair representations are more important than MSA representations: Zeroing Pair representations causes a substantial drop, whereas zeroing MSA representations has minimal effect on most proteins.
- Among triangular operations, multiplicative updates are more important than triangular attention: Triangular attention has negligible impact on most proteins, while multiplicative updates exhibit high variance across proteins.
- Length dependence: Longer proteins rely more heavily on MSA-related features, while shorter proteins rely more on triangular attention—indicating that different proteins depend on different Evoformer components.
Highlights & Insights¶
- This work presents the first systematic per-component ablation of the Evoformer at a granularity far exceeding that of prior studies.
- The finding that MSA column attention alone is sufficient highlights the central role of evolutionary sequence information in structure prediction.
- The length-dependence findings provide a new perspective for understanding prediction mechanisms across different protein types.
- The heterogeneous contributions of triangular operations—multiplicative updates are critical but highly variable, while attention is nearly negligible—challenges the simplistic view that triangular operations are uniformly important.
- The work transfers methodological tools from Transformer interpretability research to the protein structure prediction domain.
Limitations & Future Work¶
- Only 154 proteins from a CAMEO subset are used, limiting statistical power.
- The influence of fold type on component importance is not analyzed, which may be a key factor explaining the observed heterogeneity.
- Component ablations are applied globally (all 48 blocks skipped simultaneously), leaving per-block or per-layer importance differences unexplored.
- The use of zero recycles simplifies the experimental setup, and the contribution of the recycling mechanism itself is not fully examined.
- The learning dynamics of individual components during training are not analyzed.
Related Work & Insights¶
- AlphaFold2 interpretability: ExplainableFold (via residue deletion/substitution), SHAP-based analyses, etc. The present work is complementary in its focus on architectural components.
- Protein language model interpretability: Sparse autoencoder analyses of ESM-2, studies correlating attention maps with protein properties.
- Transformer interpretability: The finding that MLP layers encode critical semantic information aligns with the observed criticality of Transition layers in this work.
- This paper provides guidance for architectural optimization of subsequent models such as AlphaFold3 and Boltz.
Rating¶
- Novelty: ★★★☆☆ (Methodology is relatively straightforward, but the research question is important and underexplored)
- Experimental Design: ★★★★☆ (Systematic and comprehensive, covering attention, non-attention, representation, and length dimensions)
- Practicality: ★★★★☆ (Directly informative for the optimization and simplification of protein structure prediction models)
- Clarity: ★★★★★ (Well-structured paper with intuitive figures and tables)