Skip to content

Locality-Sensitive Hashing-Based Efficient Point Transformer for Charged Particle Reconstruction

Conference: NeurIPS 2025 arXiv: 2510.07594 Code: Available Area: 3D Vision / Particle Physics Keywords: Point Transformer, Locality-Sensitive Hashing, Particle Tracking, End-to-End Learning

TL;DR

By combining LSH with Point Transformer, the paper proposes HEPTv2 for end-to-end particle track reconstruction, eliminating the DBScan clustering post-processing bottleneck and achieving a 28.9× speedup while maintaining competitive tracking efficiency.

Background & Motivation

Background: Particle track reconstruction in LHC high-energy physics experiments is among the most computationally intensive tasks; traditional Kalman Filters degrade under high pile-up conditions.

Limitations of Prior Work: Although GNNs deliver strong performance, they suffer from three major issues: high graph construction cost \(O(n^2)\), hardware-inefficient irregular neighborhood aggregation, and random memory access patterns that harm cache utilization. Although HEPT introduces LSH for linear complexity, the additional DBScan clustering step consumes 90% of the total runtime.

Key Challenge: Fast encoding vs. the complete task (requiring track assignment); expressiveness vs. hardware friendliness.

Core Idea: Extend HEPT to HEPTv2 by incorporating a lightweight query-based Transformer decoder that directly predicts track assignments.

Method

Overall Architecture

A three-stage pipeline: (1) Metric learning (LSH encoding) — hashing detector hits into a 1D sequence; (2) Instance decoding — a query-based decoder refines track hypotheses; (3) Assignment and post-processing — associating hits to the most probable tracks.

Key Designs

  1. LSH Encoder

  2. Function: The E2LSH scheme maps nearby hits to the same 1D bucket, enabling block-diagonal attention.

  3. Mechanism: The OR construction uses \(m_1\) independent hash tables; the AND construction concatenates \(m_2\) hash functions per table, \(h_j(x) = \lfloor(a_j \cdot x + b_j)/r\rfloor\).
  4. Design Motivation: Regular memory access patterns are GPU-friendly; intra-bucket self-attention incurs \(O(1)\) cost.

  5. End-to-End Track Assignment Decoder

  6. Function: A fixed set of 3,000 learnable track queries predicts track assignments via self-attention and cross-attention.

  7. Mechanism: A binary hit classifier determines whether a hit belongs to a track; the query-based decoder (self-attention → cross-attention → feed-forward layer) outputs per-query confidence scores and dense mask logits.
  8. Design Motivation: Eliminates DBScan post-processing, adding only 17% computational overhead (4 ms) compared to the 1,401 ms required by DBScan.

  9. Joint Loss Function

  10. Function: Five loss terms are jointly optimized.

  11. Mechanism: \(\mathcal{L} = \lambda_{nce}\mathcal{L}_{NCE} + \lambda_{clf}\mathcal{L}_{CLF} + \lambda_{ce}\mathcal{L}_{CE} + \lambda_{mask}\mathcal{L}_{BCE} + \lambda_{dice}\mathcal{L}_{Dice}\)
  12. Design Motivation: The InfoNCE contrastive loss clusters hits from the same particle, complemented by classification and mask losses, covering the full pipeline from embedding to assignment.

Loss & Training

Curriculum learning: the model is first trained on clean, trackable hits, with hard samples and low-momentum hits gradually introduced.

Key Experimental Results

Main Results (TrackML Dataset)

Method Tracking Efficiency Fake Rate Inference Time (ms) Relative Speedup
Exa.TrkX (GNN SOTA) 0.994 0.002 ~800 Baseline
HEPT + DBScan 0.923 0.070 1425 0.56×
HEPTv2 0.993 0.113 27.7 28.9×

Ablation Study

Configuration Time Note
HEPT encoder 23.7 ms No track assignment
+ Decoder 27.7 ms Only +17% overhead
vs. HEPT + DBScan 1425 ms 50× slower

Key Findings

  • The elevated fake rate (0.002 → 0.113) is acceptable — offline reconstruction is less sensitive to fake tracks than online triggering.
  • Across different momentum ranges and pseudorapidity regions, HEPTv2 differs from Exa.TrkX by only 0.2%.
  • The encoder has only 850K parameters; the decoder adds 250K, totaling 1.1M — extremely lightweight.

Highlights & Insights

  • Truly end-to-end tracking: This is the first application of an LSH Transformer to a complete physics tracking pipeline without external clustering. The approach is broadly inspirational for detection and segmentation tasks that rely on post-processing.
  • Hardware-friendly: A latency of 28 ms/event is acceptable for online trigger environments (10 kHz readout rate), suggesting practical deployability.
  • Reasonable trade-off: Accepting a modest increase in fake rate in exchange for a 30× speedup is well-justified given the practical requirements of physics experiments.

Limitations & Future Work

  • The fake rate gap relative to GNNs remains the primary weakness (0.113 vs. 0.002); more sophisticated mask refinement may be required.
  • The current work is limited to pixel detectors; the full HL-LHC system includes strip detectors (approximately 6× more hits).
  • The fixed 3,000 queries may be redundant for simple events and insufficient for highly complex ones.
  • vs. HEPT: HEPT produces embeddings only and relies on DBScan (accounting for 90% of runtime); HEPTv2 eliminates this bottleneck end-to-end.
  • vs. Mask3D: The decoder design draws on the extension of Mask2Former ideas to 3D.

Rating

  • Novelty: ⭐⭐⭐⭐ A natural extension of HEPT; the primary contribution lies in the application.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive on pixels; Strip/HL-LHC validation remains to be done.
  • Writing Quality: ⭐⭐⭐⭐ Clear and well-structured.
  • Value: ⭐⭐⭐⭐⭐ A critical application in high-energy physics; the 30× speedup is highly significant.