CVPR2025 Autonomous Driving single pixel imaging image classification microLED Hadamard patterns extreme learning machine compressed sensing

Single Pixel Image Classification using an Ultrafast Digital Light Projector¶

Conference: CVPR2025
arXiv: 2603.12036
Code: Data publicly available (see paper for link)
Area: Autonomous Driving
Keywords: single pixel imaging, image classification, microLED, Hadamard patterns, extreme learning machine, compressed sensing

TL;DR¶

Achievement of single-pixel imaging (SPI)-based MNIST image classification utilizing an ultrafast microLED-on-CMOS digital light projector, reaching $>90\%$ classification accuracy at a frame rate of 1.2 kfps. This completely bypasses image reconstruction to classify directly from temporal optical signals.

Background & Motivation¶

Key Challenge¶

Key Challenge: Background: 1. High-speed image classification is increasingly demanded in machine vision, yet the operating bandwidth of conventional digital cameras remains a bottleneck. 2. Single-pixel imaging (SPI) requires only a single-point detector and structured illumination sequences to image, offering unique advantages in high-speed and non-conventional wavebands (outside the range of silicon-based detectors). 3. DMDs (Digital Micromirror Devices) are limited by mechanical switching, with pattern switching rates of only around $10^4$ fps; microLED arrays can achieve pattern generation approximately 100 times faster than DMDs. 4. Existing SPIC (Single-Pixel Image Classification) works are mostly based on simulation or low-speed experiments, lacking validation through real, ultra-high-speed, free-space optical experiments. 5. Classifying directly from spatio-temporally encoded optical signals (bypassing image reconstruction) significantly reduces hardware complexity at the detection end. 6. One-vs-all binary classification scenarios resemble anomaly detection and hold potential for industrial applications.

Method¶

Overall Architecture¶

The SPI classification framework comprises three stages: (1) a microLED projector projects a sequence of Hadamard patterns onto the target object; (2) a single-pixel photodetector collects the time-series of integrated light intensities; (3) a low-complexity ML model classifies directly from the temporal sequence without reconstructing the image.

Key Designs¶

1. Optical System - microLED-on-CMOS projector: 128×128 pixel array, 30×30 μm² pixels, 50 μm pitch, with MHz-level global shutter frame switching. - 12×12 Hadamard pattern set (Had12), totaling 144 patterns (288 binary frames, including positive/negative complementary pairs). - Projection frame rate of 330,000 fps $\to$ single image encoding $<1$ ms $\to$ effective classification frame rate of 1.2 kfps. - A DMD displays binarized MNIST digits, which are then acquired by a SiPM single-pixel detector.

2. Extreme Learning Machine (ELM) - A single-hidden-layer neural network where the input weights are randomly initialized and kept fixed. - Output weights are solved analytically in a single step via ridge regression: $$\beta = (H^\top H + \alpha I)^{-1} H^\top T$$ - The hidden layer employs ReLU activation, with a regularization parameter $\alpha = 1.0$. - Inference time of 31 μs/digit (twice as fast as DNN).

3. Deep Neural Network (DNN) - A feedforward fully connected network: Input layer (286) $\to$ three decreasing hidden layers $\to$ softmax output. - Adam optimizer + sparse categorical cross-entropy loss. - Implemented with TensorFlow/Keras, trained for 300 epochs. - Inference time of 73 μs/digit.

4. Hadamard Pattern Subset Analysis - The Had12 set is divided by spatial frequency into Cat1 (low frequency, varying along a single axis, first 44 patterns) and Cat2 (high frequency, varying along both axes). - Using only the first 1/4 low-frequency patterns still maintains $\sim 78\%$ classification accuracy, effectively improving bandwidth. - Lower-indexed patterns (low spatial frequency) contain more useful information for classification.

Loss & Training¶

ELM: ridge regression (L2-regularized least squares)
DNN: sparse categorical cross-entropy + Adam optimization

Key Experimental Results¶

Main Results¶

Model	Accuracy	Frame Rate	Inference Time per Digit
ELM (1000 neurons)	87.37%	1.2 kfps	31 μs
DNN (full Had12)	>90%	1.2 kfps	73 μs
Numerical Simulation DNN (binarized)	97.50%	-	-
Numerical Simulation ELM (binarized)	93.32%	-	-

Hadamard Subset Compression Analysis¶

Had12 Ratio	Strategy	DNN Accuracy	Equivalent Bandwidth
1 (All)	-	>90%	1.2 kHz
1/2 first	First Half	~87%	2.4 kHz
1/4 first	First 1/4	~78%	4.8 kHz
1/2 last	Last Half	~75%	2.4 kHz

Binary Classification (One-vs-All)¶

ELM binary classification accuracy is $>99\%$, with AUC for all categories close to 1.0.
Serves as a practical baseline for scenarios similar to anomaly detection.

Noise Robustness¶

Accuracy of over 95% is still maintained under Gaussian noise of $\sigma = 0.1$ and $0.5$.
Accuracy degrades significantly and fluctuates widely when $\sigma = 1.0$.
The primary cause of performance degradation is the loss of spatial information rather than a reduction in the equivalent SNR.

Highlights & Insights¶

Unprecedented Speed: Execution of SPI-based classification at 1.2 kfps in a free-space optical experiment for the first time, which is two orders of magnitude faster than DMD-based systems.
Reconstruction Decoupling: Complete bypass of image reconstruction by classifying directly from optoelectronic time series, dramatically reducing computational and hardware costs.
Frequency Selection Strategy: Revelation of the ordered hierarchical structure of Hadamard patterns, showing that low-frequency patterns contribute most to classification, which can guide compressed sensing strategies.
ELM Simplicity & Efficiency: The single-hidden-layer ELM utilizing ridge regression achieves an inference time of only 31 μs, making it suitable for resource-constrained real-time systems.
Noise vs. Information Loss: Experimental results demonstrate that the accuracy loss under compressed sensing is primarily driven by the loss of structured information rather than noise.

Limitations & Future Work¶

Validation is limited to binarized MNIST digits, which is far from the complexity of natural image classification.
The 12×12 Hadamard resolution is extremely low, constrained by the memory depth of the FPGA board.
The use of a DMD as the target object display (rather than a real-world scene) leaves a gap compared to actual machine vision deployments.
Applicability in practical scenarios such as multi-object or moving-target conditions has not been discussed.
The classification task is simplistic (10 classes); ELM performance may falter on more complex tasks.

Complementarity with event cameras: SPI can operate in non-visible wavebands where event cameras are not supported.
The role of microLED array technology is increasingly crucial in optical and analog computing.
The frequency selection philosophy of the compressed-sensing strategy can be extended to other structured illumination systems.
The minimalist inference of ELM provides a concept for edge-side anomaly detection.

Rating¶

Novelty: ⭐⭐⭐⭐ (First ultra-high-speed experimental SPI classification)
Experimental Thoroughness: ⭐⭐⭐ (Limited to MNIST, lacks complex scenes)
Writing Quality: ⭐⭐⭐⭐ (Detailed experimental descriptions and in-depth analysis)
Value: ⭐⭐⭐ (Novel direction but limited application scenarios)