Locality-Sensitive Hashing-Based Efficient Point Transformer for Charged Particle Reconstruction¶
Conference: NeurIPS 2025 arXiv: 2510.07594 Code: Available Area: 3D Vision / Particle Physics Keywords: Point Transformer, Locality-Sensitive Hashing, Particle Tracking, End-to-End Learning
TL;DR¶
By combining LSH with Point Transformer, the paper proposes HEPTv2 for end-to-end particle track reconstruction, eliminating the DBScan clustering post-processing bottleneck and achieving a 28.9× speedup while maintaining competitive tracking efficiency.
Background & Motivation¶
Background: Particle track reconstruction in LHC high-energy physics experiments is among the most computationally intensive tasks; traditional Kalman Filters degrade under high pile-up conditions.
Limitations of Prior Work: Although GNNs deliver strong performance, they suffer from three major issues: high graph construction cost \(O(n^2)\), hardware-inefficient irregular neighborhood aggregation, and random memory access patterns that harm cache utilization. Although HEPT introduces LSH for linear complexity, the additional DBScan clustering step consumes 90% of the total runtime.
Key Challenge: Fast encoding vs. the complete task (requiring track assignment); expressiveness vs. hardware friendliness.
Core Idea: Extend HEPT to HEPTv2 by incorporating a lightweight query-based Transformer decoder that directly predicts track assignments.
Method¶
Overall Architecture¶
A three-stage pipeline: (1) Metric learning (LSH encoding) — hashing detector hits into a 1D sequence; (2) Instance decoding — a query-based decoder refines track hypotheses; (3) Assignment and post-processing — associating hits to the most probable tracks.
Key Designs¶
-
LSH Encoder
-
Function: The E2LSH scheme maps nearby hits to the same 1D bucket, enabling block-diagonal attention.
- Mechanism: The OR construction uses \(m_1\) independent hash tables; the AND construction concatenates \(m_2\) hash functions per table, \(h_j(x) = \lfloor(a_j \cdot x + b_j)/r\rfloor\).
-
Design Motivation: Regular memory access patterns are GPU-friendly; intra-bucket self-attention incurs \(O(1)\) cost.
-
End-to-End Track Assignment Decoder
-
Function: A fixed set of 3,000 learnable track queries predicts track assignments via self-attention and cross-attention.
- Mechanism: A binary hit classifier determines whether a hit belongs to a track; the query-based decoder (self-attention → cross-attention → feed-forward layer) outputs per-query confidence scores and dense mask logits.
-
Design Motivation: Eliminates DBScan post-processing, adding only 17% computational overhead (4 ms) compared to the 1,401 ms required by DBScan.
-
Joint Loss Function
-
Function: Five loss terms are jointly optimized.
- Mechanism: \(\mathcal{L} = \lambda_{nce}\mathcal{L}_{NCE} + \lambda_{clf}\mathcal{L}_{CLF} + \lambda_{ce}\mathcal{L}_{CE} + \lambda_{mask}\mathcal{L}_{BCE} + \lambda_{dice}\mathcal{L}_{Dice}\)
- Design Motivation: The InfoNCE contrastive loss clusters hits from the same particle, complemented by classification and mask losses, covering the full pipeline from embedding to assignment.
Loss & Training¶
Curriculum learning: the model is first trained on clean, trackable hits, with hard samples and low-momentum hits gradually introduced.
Key Experimental Results¶
Main Results (TrackML Dataset)¶
| Method | Tracking Efficiency | Fake Rate | Inference Time (ms) | Relative Speedup |
|---|---|---|---|---|
| Exa.TrkX (GNN SOTA) | 0.994 | 0.002 | ~800 | Baseline |
| HEPT + DBScan | 0.923 | 0.070 | 1425 | 0.56× |
| HEPTv2 | 0.993 | 0.113 | 27.7 | 28.9× |
Ablation Study¶
| Configuration | Time | Note |
|---|---|---|
| HEPT encoder | 23.7 ms | No track assignment |
| + Decoder | 27.7 ms | Only +17% overhead |
| vs. HEPT + DBScan | 1425 ms | 50× slower |
Key Findings¶
- The elevated fake rate (0.002 → 0.113) is acceptable — offline reconstruction is less sensitive to fake tracks than online triggering.
- Across different momentum ranges and pseudorapidity regions, HEPTv2 differs from Exa.TrkX by only 0.2%.
- The encoder has only 850K parameters; the decoder adds 250K, totaling 1.1M — extremely lightweight.
Highlights & Insights¶
- Truly end-to-end tracking: This is the first application of an LSH Transformer to a complete physics tracking pipeline without external clustering. The approach is broadly inspirational for detection and segmentation tasks that rely on post-processing.
- Hardware-friendly: A latency of 28 ms/event is acceptable for online trigger environments (10 kHz readout rate), suggesting practical deployability.
- Reasonable trade-off: Accepting a modest increase in fake rate in exchange for a 30× speedup is well-justified given the practical requirements of physics experiments.
Limitations & Future Work¶
- The fake rate gap relative to GNNs remains the primary weakness (0.113 vs. 0.002); more sophisticated mask refinement may be required.
- The current work is limited to pixel detectors; the full HL-LHC system includes strip detectors (approximately 6× more hits).
- The fixed 3,000 queries may be redundant for simple events and insufficient for highly complex ones.
Related Work & Insights¶
- vs. HEPT: HEPT produces embeddings only and relies on DBScan (accounting for 90% of runtime); HEPTv2 eliminates this bottleneck end-to-end.
- vs. Mask3D: The decoder design draws on the extension of Mask2Former ideas to 3D.
Rating¶
- Novelty: ⭐⭐⭐⭐ A natural extension of HEPT; the primary contribution lies in the application.
- Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive on pixels; Strip/HL-LHC validation remains to be done.
- Writing Quality: ⭐⭐⭐⭐ Clear and well-structured.
- Value: ⭐⭐⭐⭐⭐ A critical application in high-energy physics; the 30× speedup is highly significant.