Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer¶

Conference: NEURIPS2025 (Workshop: ML and Physical Sciences) arXiv: 2510.06691 Code: https://github.com/Giovanni-Sforza/MaskPoint-AMPT Area: Physics Keywords: heavy-ion collisions, masked autoencoder, Transformer, self-supervised pre-training, quark-gluon plasma

TL;DR¶

This work introduces a masked point cloud Transformer autoencoder to heavy-ion collision analysis. Through a two-stage paradigm of self-supervised pre-training followed by supervised fine-tuning, the model learns nonlinear latent representations substantially stronger than those of PointNet—reducing PC1 distribution overlap from 2.42% to 0.27%—providing a general feature learning framework for studying QGP properties.

Background & Motivation¶

Background: Relativistic heavy-ion collisions are the sole experimental means of studying QCD phase transitions and quark-gluon plasma (QGP) properties. Traditional analyses rely on hand-crafted observables (particle spectra, anisotropic flow, etc.), but these scalar quantities fail to fully exploit the information contained in high-dimensional final-state data.

Limitations of Prior Work: - Traditional observables are hand-selected and may miss physically important yet subtle data structures. - Deep learning methods such as PointNet have been preliminarily applied to collision data, but the representations they learn are essentially linear copies of individual physical observables (e.g., \(\sigma_\eta\)). - Systematic application of self-supervised pre-training in high-energy nuclear physics remains lacking.

Key Challenge: Final-state particle data constitute high-dimensional unordered point clouds. Capturing global inter-particle correlations is essential, yet PointNet's global pooling discards fine-grained interaction information between particles.

Goal: - Introduce a Transformer autoencoder to learn richer representations of collision events. - Verify whether self-supervised pre-training can capture nonlinear physical structures beyond individual observables.

Key Insight: The three-momenta \((p_x, p_y, p_z)\) of final-state particles are treated as a 3D point cloud, leveraging mature masked point cloud modeling techniques from computer vision.

Core Idea: A self-supervised pre-trained Transformer autoencoder learns nonlinear physical features from heavy-ion collision point clouds, significantly outperforming the linear representations of PointNet.

Method¶

Overall Architecture¶

Two-stage paradigm: - Stage 1 — Self-supervised Pre-training: Mask 25% of the point cloud → Transformer encoder extracts a 96-dimensional feature vector \(\mathbf{f}\) → Transformer decoder discriminates "real particles vs. fake particles." - Stage 2 — Supervised Fine-tuning: Freeze the encoder → MLP classifier performs collision system identification (Pb+Pb vs. p+Pb).

Key Designs¶

Masked Discriminative Pre-training:
- Function: Learn the intrinsic physical structure of collision events without labels.
- Mechanism: Farthest Point Sampling (FPS) masks 25% of the point cloud; the remaining 96 particles pass through PointNet for local feature extraction → 6-layer Transformer encodes global correlations → produces a 96-dimensional feature \(\mathbf{f}\). The decoder uses cross-attention to fuse \(\mathbf{f}\) with either "truly masked points" or "randomly sampled fake points," and an MLP performs binary classification (real vs. fake) trained with cross-entropy loss.
- Design Motivation: The discriminative objective (rather than reconstruction) compels the encoder to learn high-quality physical features by distinguishing collision-generated real particles from randomly sampled fake ones.
Transformer Encoder Architecture:
- Function: Capture long-range correlations between particles.
- Mechanism: PointNet first extracts local features for each patch; a 6-layer Transformer (self-attention) then models global particle–particle interactions.
- Design Motivation: PointNet's global max-pooling discards inter-particle relational information, whereas Transformer self-attention preserves it.
PCA + SHAP Interpretability Analysis:
- Function: Investigate what physical information the learned features encode.
- Mechanism: PCA reduces the 96-dimensional features to principal components; their linear correlations with traditional physical observables (\(\sigma_\eta\), \(\langle p_T \rangle\), etc.) are computed. Random Forest + SHAP reveals nonlinear associations.
- Key Findings: PointNet's PC1 correlates linearly with \(\sigma_\eta\), indicating it merely "fits" a known observable. The autoencoder's PC1 shows near-zero linear correlation with \(\sigma_\eta\), yet SHAP identifies \(\sigma_\eta\) as the most important contributor—evidence of nonlinear encoding.

Loss & Training¶

Data: AMPT-simulated Pb+Pb and p+Pb collision events; 128 particles per event (\(|\eta|<2.4\), \(p_T > 0.4\) GeV/c).
Pre-training and fine-tuning each run for 300 epochs; AdamW optimizer with cosine learning rate decay.
Masking ratio: 25%.

Key Experimental Results¶

Main Results — Collision System Classification¶

Method	PC1 Distribution Overlap	Classification Accuracy (Full Multiplicity Range)	Notes
\(\sigma_\eta\) (physical observable)	2.71%	—	Theoretical optimum for a single variable
PointNet	2.42%	Lower	Approaches the theoretical limit of \(\sigma_\eta\)
MaskPoint Transformer	0.27%	Significantly higher	Breaks the single-variable limit

Ablation Study¶

Configuration	Classification Accuracy	Notes
With PointNet preprocessing	Higher	Local feature extraction is beneficial
Without PointNet preprocessing	Lower	Direct Transformer on raw point clouds performs poorly
Masking ratio 25%	Optimal	Best performance in experiments
Masking ratio 50% / 75%	Degraded	Too few visible particles for the encoder to learn sufficient information

Key Findings¶

Breaking the single-variable limit: PointNet's PC1 overlap (2.42%) is close to \(\sigma_\eta\) (2.71%), indicating it essentially learns a linear representation of \(\sigma_\eta\). The autoencoder's PC1 overlap is only 0.27%—stronger than any single known observable—suggesting it captures novel physical information.
Unsupervised PCA space naturally separates collision systems: Figure 2 shows that even after unsupervised pre-training alone, the PC1–PC2 space already clearly distinguishes Pb+Pb from p+Pb, indicating the encoder spontaneously learns intrinsic differences between the two collision systems.
Direct evidence of nonlinear encoding: Near-zero linear correlation with \(\sigma_\eta\) yet highest SHAP importance confirms that the information is encoded as a nonlinear combination rather than a simple copy.

Highlights & Insights¶

"Low linear correlation + high SHAP importance" interpretability paradigm: This analytical approach elegantly distinguishes whether a model is merely performing linear fitting. High linear correlation implies the model is copying the observable; low linear correlation with high SHAP importance indicates a deeper nonlinear structure has been learned. This paradigm is broadly transferable to other AI4Science tasks.
Demonstration of self-supervised pre-training value in particle physics: The work demonstrates that "pre-train for representations, then fine-tune for tasks" significantly improves performance in high-energy nuclear physics, providing a basis for building foundation models in particle physics.
Cross-domain transfer of the point cloud perspective: The direct transfer of masked point cloud modeling from CV/3D domains to physics confirms the cross-domain applicability of these methods.

Limitations & Future Work¶

Simulation data only: Events are generated with AMPT; validation on real experimental data has not been performed.
Limited input features: Only three-momenta \((p_x, p_y, p_z)\) are used; four-momenta, charge, spin, and other particle attributes are not exploited.
Simple downstream task: Collision system identification (Pb+Pb vs. p+Pb) is not a genuine practical challenge and serves only as a proxy for evaluating representation quality.
Absence of physical priors: Conservation laws, Lorentz symmetry, and other physical constraints are not incorporated.
Future directions:
- Validate on real RHIC/LHC data.
- Incorporate Lorentz-equivariant network architectures.
- Apply to more physically meaningful downstream tasks: chiral magnetic effect (CME) detection, nuclear deformation studies.
- Integrate particle species information (\(\pi, K, p\), etc.).

vs. PointNet: PointNet can only learn linear approximations of physical observables; the Transformer learns nonlinear combinations, representing a qualitative leap in representational capacity.
vs. OmniJet-α / Particle Transformer: These are foundation models for LHC jet physics; this work transplants analogous ideas into the heavy-ion collision domain.
vs. traditional observable-based analysis: Traditional methods rely on domain experts to design a small set of observables, potentially missing important information. Self-supervised methods can discover new "data-driven observables."

Rating¶

Novelty: ⭐⭐⭐⭐ First application of self-supervised Transformers to heavy-ion collision feature learning; interpretability analysis paradigm is elegant.
Experimental Thoroughness: ⭐⭐⭐ Workshop-paper scale; only one downstream task; simulation data only.
Writing Quality: ⭐⭐⭐⭐ Well-balanced presentation of physical background and ML methodology; interpretability analysis is clearly articulated.
Value: ⭐⭐⭐⭐ Lays groundwork for AI foundation models in high-energy nuclear physics; methodology is transferable.