Single Pixel Image Classification using an Ultrafast Digital Light Projector¶

Conference: CVPR 2026 arXiv: 2603.12036 Code: None Area: Computational Imaging / Single-Pixel Imaging Keywords: single pixel imaging, Hadamard patterns, microLED-on-CMOS, extreme learning machine, compressed sensing

TL;DR¶

An ultrafast microLED-on-CMOS digital light projector (330 kfps global shutter) is employed for single-pixel imaging. Twelve-by-twelve Hadamard patterns are projected onto MNIST digits, and a single-pixel photodetector acquires a time series of aggregated light intensities. Image reconstruction is entirely bypassed; an ELM or DNN directly classifies the time series. The system achieves greater than 90% multi-class accuracy and greater than 99% AUC binary classification (anomaly detection) at 1.2 kfps.

Background & Motivation¶

Background: Single-pixel imaging (SPI) replaces array sensors with structured illumination and a single-point detector, enabling simple hardware that operates across arbitrary spectral bands (infrared, THz, etc.). Conventional pattern generators based on DMDs are limited to approximately \(10^4\) fps by mechanical tilting; recent microLED arrays improve switching speed by roughly two orders of magnitude.

Limitations of Prior Work:

Most single-pixel image classification (SPIC) studies rely on purely numerical simulation and lack validation on real optical systems.
The conventional SPI pipeline of reconstructing an image before classification introduces unnecessary latency, and reconstruction itself is a computational bottleneck.
The mechanical switching speed of DMDs constrains real-time applications (practical image generation rates \(\lesssim 10^2\) Hz).

Key Challenge: SPI information acquisition is inherently a spatio-temporal transform (2D space → 1D time series). Whether the reconstruction step is truly necessary remains an open question.

Goal: Experimentally validate ultrafast SPIC in a real free-space optical system, completely bypassing image reconstruction.

Key Insight: Leverage the ultrafast switching capability of microLEDs to project Hadamard patterns and directly classify the photodetector time series.

Core Idea: Exploit a microLED ultrafast projector to achieve sub-millisecond Hadamard encoding, then directly classify the single-pixel detector time series without image reconstruction.

Method¶

Overall Architecture¶

A DMD displays binarized MNIST images → the microLED projector sequentially projects 288 Hadamard patterns (144 basis patterns of a \(12\times12\) basis, each paired with its complement) → a single-pixel photodetector (SiPM) acquires the differential light intensity of each complementary pair → a real-time oscilloscope records the time series (286-dimensional feature vector) → an ELM or DNN directly classifies the series → the digit class (0–9) is output.

Key Designs¶

microLED-on-CMOS Ultrafast Light Projector
- \(128\times128\) active-matrix microLED array, \(30\times30\ \mu\text{m}^2\) pixels, \(50\ \mu\text{m}\) pitch.
- Supports binary mode and 5-bit grayscale; global shutter mode switches at 330 kfps.
- Maps \(12\times12\) Hadamard patterns onto the microLED array to illuminate the DMD.
- Key advantage: approximately 30× faster than DMD mechanical tilting; projecting the complete 288-pattern set requires only approximately 0.87 ms.
- The system bottleneck shifts from pattern generation to DMD object switching (32.5 kHz).
Hadamard Pattern Compression and Ordering Strategy
- Had12 comprises 288 patterns (144 basis × complementary pairs), ordered by sequency (the spatial-frequency analogue).
- Key finding: low-sequency patterns (fewer spatial sign changes) carry the most classification information.
- Using the first half of patterns maintains approximately 85% accuracy; the first quarter yields approximately 78%, with a corresponding 2–4× throughput increase.
- Three selection strategies are compared: first-\(n\) (best) >> random selection (intermediate) >> last-\(n\) (worst).
- Analogy to Fourier analysis: low sequency ≈ low-frequency components, which suffice for coarse-grained classification.
Two Lightweight Classification Models
- ELM (Extreme Learning Machine): single hidden layer with randomly fixed input weights (not trained); the output weights are solved in closed form via ridge regression (\(\alpha=1.0\)). Achieves 87.37% multi-class accuracy with 1000 hidden neurons; inference takes 31 μs per sample. Core formula: \(\beta = (H^\top H + \alpha I)^{-1} H^\top T\)
- DNN: three fully connected layers (\(286\to\text{decreasing}\to10\)) with ReLU and Softmax, Adam optimizer, 300 epochs. Achieves greater than 90% accuracy with the full Had12 set; inference takes 73 μs per sample.

Loss & Training¶

ELM: closed-form ridge regression solution, no iterative training, \(\alpha=1.0\).
DNN: sparse categorical cross-entropy + Adam, 300 epochs.
Noise robustness: accuracy exceeds 95% under additive white Gaussian noise with \(\sigma=0.5\); significant degradation occurs at \(\sigma=1.0\). Performance degradation is primarily attributed to the loss of structural information rather than equivalent SNR variation.

Key Experimental Results¶

Main Results¶

Configuration	Accuracy	Equivalent Frame Rate	Inference Time/Sample
Binary MNIST + DNN (simulation baseline)	97.50%	—	—
Binary MNIST + ELM (simulation baseline)	93.32%	—	—
Experimental Had12 full + DNN	>90%	1.2 kHz	73 μs
Experimental Had12 full + ELM (10-class)	87.37%	1.2 kHz	31 μs
Experimental Had12 1/4 + DNN	~78%	4.8 kHz	—
Experimental Had12 + ELM (one-vs-all binary)	>99% AUC	1.2 kHz	31 μs

Ablation Study¶

Effect of Pattern Selection Strategy on Classification Accuracy (DNN):

Pattern Selection	Proportion	Equivalent Frame Rate	Approx. Accuracy
First-\(n\) (low sequency)	100%	1.2 kHz	>90%
First-\(n\)	50%	2.4 kHz	~85%
First-\(n\)	25%	4.8 kHz	~78%
Random selection	25%	4.8 kHz	~70%
Last-\(n\) (high sequency)	25%	4.8 kHz	~60%

Key Findings¶

Low-sequency Hadamard patterns carry substantially more classification information than high-sequency patterns, analogous to the dominance of low-frequency components in FFT.
DNN learning curves reveal a prolonged vanishing-gradient phase when fewer patterns are used, demonstrating that performance degradation stems from the loss of structural information rather than noise.
The gap between ELM training and test accuracy is less than 1%, indicating no overfitting and confirming that the single-pixel encoded features possess sufficient discriminability.
ELM binary classification AUC exceeds 99% across all classes, making it well-suited for go/no-go decisions (anomaly detection scenarios) in ultrafast pipelines.

Highlights & Insights¶

The "classify without reconstruction" paradigm deserves attention: 2D spatial information is encoded into a 1D time series for direct classification, with information preservation guaranteed by the Hadamard orthogonal basis.
The frequency-ordered pattern selection strategy is simple yet effective: using only the first quarter of patterns trades a 4× throughput gain for approximately 12% accuracy reduction.
The ELM anomaly detector is extremely lightweight: closed-form training, 31 μs inference, and AUC greater than 99%, making it suitable for embedded and edge deployment.
This work constitutes the first experimental validation of kHz-level SPIC in a real free-space optical system, advancing the field beyond simulation.

Limitations & Future Work¶

Validation is limited to binarized \(28\times28\) MNIST, which is far less complex than real machine-vision scenarios; performance on grayscale, color, or natural scene imagery remains unknown.
The \(12\times12\) Hadamard constraint originates from FPGA memory depth; practical applications require pattern sets of higher resolution.
DMD object switching (32.5 kHz) remains the system bottleneck, leaving the 330 kfps advantage of the microLED underutilized.
No direct comparison with event cameras is provided, despite claimed advantages over them.
The experiments depend on a specific free-space optical path; engineering deployment and integration strategies are not discussed.

vs. Conventional SPI + Classification: Prior SPIC work is predominantly simulation-based or relies on low-speed hardware; this paper is the first to experimentally validate kHz-level classification on an ultrafast optical system.
vs. microLED Analog Optical Computing: Prior work uses microLEDs for analog optical neural networks (matrix–vector multiplication); this paper uses microLEDs for pattern projection followed by electronic post-processing—the two approaches are complementary.
vs. Event Cameras: Both address high-speed perception, but SPI can operate across arbitrary spectral bands beyond visible light (infrared, THz), whereas event cameras are confined to the silicon sensor spectral range.
Insight: The "sensing as computing" paradigm holds promise for edge and optical computing; the Hadamard compression strategy may inspire frame/token compression in video understanding.

Rating¶

Novelty: ⭐⭐⭐ The single-pixel classification concept is not original; the core contribution lies in hardware system integration and experimental validation.
Experimental Thoroughness: ⭐⭐⭐⭐ Multiple pattern strategies, two classification models, noise analysis, and learning curve analysis are systematically evaluated.
Writing Quality: ⭐⭐⭐⭐ Clear and readable, with detailed experimental setup and optical path descriptions; figures and tables are intuitive.
Value: ⭐⭐⭐ An interesting system integration effort, though the gap between MNIST validation and real-world application remains substantial.