Simultaneous Motion And Noise Estimation with Event Cameras¶

Conference: ICCV 2025 arXiv: 2504.04029 Code: GitHub Area: Video Understanding Keywords: Event cameras, denoising, motion estimation, contrast maximization, joint estimation

TL;DR¶

This paper presents the first joint method for simultaneous motion estimation and noise estimation with event cameras. It scores each event using the local contrast in the motion-compensated image of warped events (IWE) within the Contrast Maximization (CMax) framework, and obtains motion parameters along with signal/noise classification through alternating optimization. The method achieves state-of-the-art performance on the E-MLB denoising benchmark.

Background & Motivation¶

Event cameras are novel vision sensors that overcome limitations of conventional cameras such as motion blur and limited dynamic range. However, because they operate under low-power (sub-threshold) conditions, they produce substantial noise—particularly background activity (BA) noise.

Key problems with existing approaches:

Decoupling of denoising and motion estimation: Existing denoising methods are typically designed independently, treating motion estimation as a separate downstream task. Yet motion is an intrinsic property of event data—scene edges cannot be perceived without motion. The two tasks should be treated as complementary.

Difficulty in obtaining ground truth: Learning-based methods require GT noise labels, which are undefined for real-world data. Existing solutions either rely on simulation or obtain "pure signal" data through aggressive pre-filtering, a process that may alter the signal/noise characteristics of events.

Circular dependency: Denoising requires knowledge of true motion (to separate signal events from noise), while accurate motion estimation requires signal events (since noise carries no motion information).

Core Insight: Motion information can aid denoising, and vice versa. The two problems should be integrated into a unified framework and solved simultaneously. This work is the first method to jointly estimate motion (ego-motion, optical flow, etc.) and noise.

Method¶

Overall Architecture¶

Iterative alternating optimization built on the Contrast Maximization (CMax) framework:

Estimate motion from current signal events (one optimization step of CMax)
Apply motion compensation to all events using the estimated motion
Score each event based on its local contrast in the compensated IWE
Sort events by score and threshold to classify as signal/noise
Update the signal event set and repeat from step 1

Key Designs¶

Contrast Maximization (CMax) Foundation

The CMax framework assumes events are generated by moving edges. It transforms event coordinates via a motion model \(\mathbf{W}\) and warps the event set \(\mathcal{E} = \{e_k\}_{k=1}^{N_e}\) to a reference timestamp:

\(e_k = (\mathbf{x}_k, t_k, p_k) \mapsto e'_k = (\mathbf{x}'_k, t_{ref}, p_k)\)

The warped events are accumulated on a pixel grid to form the image of warped events (IWE):

\(I(\mathbf{x}; \boldsymbol{\theta}) = \sum_{k=1}^{N_e} \delta(\mathbf{x} - \mathbf{x}'_k)\)

where the Dirac delta is approximated by a Gaussian. The optimization objective is to maximize the contrast (image variance) of the IWE, thereby finding the motion parameters that best align the events.

Two motion models are supported: rotational motion (3-DOF angular velocity estimation) and dense optical flow (per-pixel velocity estimation, \(2N_p\) DOF).

Local Contrast-Based Denoising

Core Idea: Signal events, after correct motion compensation, cluster at edge locations and produce high IWE values; noise events, being randomly distributed, do not cluster and thus yield low IWE values.

A score \(c_k\) is computed for each event \(e_k\):

\(c_k = I(\mathbf{x}'_k)\)

This is the local value of the event's position in the motion-compensated IWE. A higher IWE value indicates that more events support the same scene edge, implying a higher probability that the event is a signal.

Events are sorted by score and thresholded to classify:

\(\mathcal{E}_{signal} = \{e_k \in \mathcal{E} \mid c_k > T(\eta)\}\) \(\mathcal{E}_{noise} = \mathcal{E} \setminus \mathcal{E}_{signal}\)

where \(\tau = 1 - \eta\) is the fraction of signal events and \(\eta\) is the noise ratio (given as prior or estimated).

Invariance: The classification result is invariant to monotonically increasing transformations of \(c_k\) (e.g., logarithm, exponential), since the ranking is preserved.

Robustness to varying edge strengths: The Gaussian kernel in the IWE controls sensitivity to edge intensity; enlarging the kernel improves retention of signal events in low-IWE-intensity regions.

Alternating Optimization

Signal/noise classification and motion estimation form a circular dependency—classification requires true motion, and motion estimation requires signal events.

The solution is iterative alternating optimization: - Initialization: randomly partition events into signal and noise sets - Each iteration: ① run one CMax motion estimation step using current signal events → ② warp all events using the estimated motion → ③ compute scores \(c_k\) for all events → ④ reclassify signal/noise sets - Convergence criterion: convergence of motion parameters

Computational complexity: \(O(N_p + N_e \log N_e)\) per iteration, only a \(\log\) factor more than the original CMax's \(O(N_p + N_e)\), owing to the sorting step.

Flexibility: The CMax motion estimator can be replaced by any other estimator, including deep neural networks, making the method highly extensible.

Loss & Training¶

This is an unsupervised method requiring no training loss. The objective for motion estimation is the variance (contrast) of the IWE:

\[\text{Var}(I(\mathbf{x}; \boldsymbol{\theta})) = \frac{1}{|\Omega|}\int_{\Omega}(I(\mathbf{x}; \boldsymbol{\theta}) - \mu_I)^2 d\mathbf{x}\]

The optimal motion parameters \(\boldsymbol{\theta}^*\) are found by maximizing this objective. The entire process is optimization-based rather than learning-based, requiring no GT labels.

Key Experimental Results¶

Main Results¶

E-MLB denoising benchmark (MESR↑, higher is better):

Method	Category	Day ND1	Day ND4	Day ND16	Day ND64	Night ND1
BAF	Model	0.861	0.869	0.876	0.890	0.946
IETS	Model	0.772	0.785	0.777	0.753	0.950
MLPF	Learning	0.851	0.855	0.846	0.840	0.926
EDformer	Learning	0.952	0.955	0.956	0.942	1.048
Ours	Model	0.938	0.958	0.986	0.950	1.037

The proposed method ranks first or second among model-based methods, and under several conditions even surpasses learning-based methods that require GT labels for training.

DND21 denoising benchmark (AUC↑):

Method	hotel 1Hz	driving 1Hz	hotel 5Hz	driving 5Hz
BAF	0.9535	0.8479	0.8916	0.7930
TS	0.9716	0.9307	0.9606	0.9270
EDformer	0.9928	0.9541	0.9845	0.9424
Ours	1.014	0.882	0.963	0.855

Ablation Study¶

Motion estimation improvement (ECD dataset, rotational motion):

Configuration	Method	Effect	Note
No denoising	Original CMax	Initialization-sensitive	Prone to local optima
BAF preprocessing	CMax + BAF	Partial improvement	Simple filtering insufficient
Joint estimation	Ours	Significant improvement	Reduced sensitivity to initialization

Optical flow estimation (MVSEC dataset) + denoising combination:

Configuration	Note
Deep learning-based motion estimator	CMax can be replaced by a DNN for joint estimation
Image reconstruction quality	Image quality from denoised events substantially better than from raw events

Key Findings¶

Denoising improves motion estimation: The joint method reduces CMax's sensitivity to initialization, making rotational motion estimation more robust.
Motion improves denoising: Correct motion compensation increases the clustering density of signal events, leading to more accurate noise classification.
Unsupervised outperforms supervised: Under multiple E-MLB conditions, the unsupervised method surpasses learning-based methods trained with GT labels.
Method flexibility: Compatible with deep learning-based motion estimators; not restricted to the CMax framework.
Practical application: Denoised events used for image intensity reconstruction yield fewer artifacts and higher image quality.

Highlights & Insights¶

First-principles driven: Starting from the physical principle that "noise is uncorrelated with motion," two seemingly independent problems are unified into a joint estimation framework.
Unsupervised, no GT labels required: Overcomes the dependency of learning-based methods on annotated data, offering greater practical utility in real-world scenarios.
Theoretical elegance: The score–sort–threshold pipeline based on IWE values is concise and well-motivated; classification results are invariant to monotonic transformations of the scores.
Computationally efficient: Adds only the overhead of one sorting operation compared to the original CMax.
Open source: A complete open-source implementation is provided.

Limitations & Future Work¶

Requires prior knowledge or estimation of the noise ratio \(\eta\); the optimal \(\eta\) may vary across scenes.
Does not handle non-BA noise caused by flickering or active light sources.
Alternating optimization may converge to local optima, particularly under high noise-rate conditions.
The "pure signal" reference in the DND21 dataset is obtained through aggressive filtering, which may deviate from the true signal distribution and lead to underestimated evaluation scores.

The CMax framework is widely used in the event camera community; this paper demonstrates its novel applicability to denoising.
The concept of simultaneous estimation can be generalized to other sensor fusion problems (e.g., LiDAR denoising + motion estimation).
The use of local contrast as a signal indicator may inspire other event-based tasks, such as confidence estimation in object detection.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First joint motion+noise estimation; first-principles driven
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-dataset validation with multi-task application demonstrations
Writing Quality: ⭐⭐⭐⭐⭐ Clear mathematical derivations with strong physical intuition
Value: ⭐⭐⭐⭐ Opens a new research direction for event cameras