MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes¶
Conference: NeurIPS 2025 arXiv: 2506.06318 Code: GitHub Area: Signal Processing / Inertial Navigation / Self-Supervised Learning Keywords: MEMS gyroscope, self-supervised learning, Mixture of Experts, over-range reconstruction, signal denoising
TL;DR¶
This paper proposes MoE-Gyro, a self-supervised Mixture-of-Experts framework that simultaneously addresses the fundamental range–noise trade-off in MEMS gyroscopes via an Over-Range Reconstruction Expert (ORE, incorporating Gaussian-Decay Attention and physics-informed constraints) and a Denoising Expert (DE, incorporating dual-branch complementary masking and FFT-guided augmentation). The measurable range is extended from ±450°/s to ±1500°/s, and bias instability is reduced by 98.4%.
Background & Motivation¶
MEMS gyroscopes play a critical role in inertial navigation and motion control, yet face a fundamental trade-off: - Increasing the measurement range → degraded noise characteristics (ARW and BI) - Optimizing for low noise → limited angular velocity measurement capability
Limitations of prior approaches:
Hardware solutions (resonant frequency tuning, closed-loop force feedback, multi-range readout circuits): increased manufacturing complexity and cost
Deep learning denoising methods (CNN, LSTM-GRU): require precisely aligned ground-truth signals and address only noise without resolving the range limitation
Self-supervised methods (LIMU-BERT, IMUDB): lack a unified framework for jointly handling denoising and over-range reconstruction
Existing evaluations lack a unified standard for assessing multi-dimensional signal enhancement
Method¶
Overall Architecture¶
MoE-Gyro comprises three core components: - Lightweight gating module: routes input signal segments to the appropriate expert based on simple heuristic rules - Over-Range Reconstruction Expert (ORE): processes saturated/clipped signal segments - Denoising Expert (DE): processes in-range noisy signal segments
Both experts share a MAE (Masked Autoencoder) backbone and are trained end-to-end in a purely self-supervised manner.
Key Designs¶
-
Gating and Routing Mechanism (Algorithm 1):
- Detects peak segments by identifying three consecutive clipped samples
- Detects noise segments by identifying consecutive samples below threshold \(\tau\)
- Supports invoking a single expert only (saving ~50% GPU memory) or both simultaneously
- Concatenates expert outputs into a complete enhanced signal
-
Gaussian-Decay Attention (GD-Attn) in ORE:
- Design Motivation: Information required to recover clipped peaks is concentrated in a short temporal window around the peak; fixed windows ignore sensor-specific characteristics and are non-differentiable
- Core Idea: Replaces binary window masks with learnable continuous Gaussian biases
- Formulation: \(B_{ij} = -\frac{d_{ij}^2}{2\sigma^2}\), where \(\sigma\) is a trainable parameter
- Output \(= \text{softmax}(QK^\top/\sqrt{d_k} + B) \cdot V\)
- As \(\sigma \to \infty\), the mechanism degrades to standard global attention; finite \(\sigma\) smoothly down-weights distant token contributions
- Optimal placement: Decoder only (versus Encoder-only, Enc.+Dec., and other configurations)
-
Correlation Loss:
- Pure L2 reconstruction tends to smooth peaks and ignores local dynamics
- First term: alignment of first-order differences (local slopes)
- Second term: amplitude preservation at sign-change positions (peaks/valleys)
- \(\mathcal{L}_{\text{corr}} = \frac{1}{|\mathcal{M}|}\sum_{t\in\mathcal{M}}(\Delta x_t - \Delta \hat{x}_t)^2 + \lambda_{\text{sign}}\frac{1}{|\mathcal{E}|}\sum_{t\in\mathcal{E}}(x_t - \hat{x}_t)^2\)
-
Physics-Informed Energy Loss (PINN):
- Derived from the displacement–power relationship of the IMU proof mass
- Computes first- and second-order discrete derivatives to define instantaneous specific power
- Applies sigmoid normalization followed by a barrier term to penalize excessively high or low energy
- Ensures physical plausibility of reconstructed waveforms and improves cross-sensor generalization
-
Dual-Branch Complementary Masking Strategy in DE:
- Both branches share all encoder and decoder weights
- Constructs complementary 50% masks (cross pattern): even-indexed patches are visible to branch A and masked for branch B, and vice versa
- Guarantees \(\mathcal{M}_A \cup \mathcal{M}_B = \{1,\ldots,L\}\) and \(\mathcal{M}_A \cap \mathcal{M}_B = \varnothing\)
- Fusion: \(y_{\text{final}} = y_A \cdot \mathcal{M}_A + y_B \cdot \mathcal{M}_B\)
- Weight-sharing regularization extracts generalizable features and effectively suppresses high-frequency random noise
-
FFT-Guided Training Augmentation:
- Noise floor estimation → weak signal injection (sampled and scaled from real IMU motion clips) → spectrum-matched noise synthesis
- Physically grounded noise characteristics based on IEEE standards and Allan variance analysis
- Synthesizes noise targeting QN-, ARW-, and BI-dominated frequency bands
- Training objective: mixed signal as input; original noise plus scaled motion as clean reference
Loss & Training¶
- Total loss = L2 reconstruction loss + Correlation loss + PINN energy loss (ORE)
- Denoising Expert uses L2 reconstruction loss + FFT-guided augmentation
- Purely self-supervised training; no temporally synchronized ground-truth labels required
- Trained on a single NVIDIA RTX-4060 GPU
Key Experimental Results¶
Main Results: ISEBench Full-Metric Comparison¶
| Model | PSNR↑ | P_MSE↓ | Corr↑ | SNR↑(dB) | QN↓ | ARW↓ | BI↓ | Avg. Rank |
|---|---|---|---|---|---|---|---|---|
| RAW | 2.67 | 1 | - | 10.18 | 0 | 0 | 0 | - |
| Matlab 2023 | 6.03 | 0.515 | 0.86 | 10.02 | -25.8% | -3.1% | +5.4% | 7.0 |
| EMD | 5.44 | 0.655 | 0.77 | 13.85 | -91.1% | -85.9% | -96.8% | 5.3 |
| SG_filter | 4.35 | 0.767 | 0.79 | 12.03 | -85.0% | -86.3% | -90.0% | 6.6 |
| HEROS_GAN | 7.70 | 0.354 | 0.89 | 16.86 | -92.8% | -51.6% | -58.3% | 3.6 |
| IMUDB | 6.59 | 0.442 | 0.82 | 17.76 | -85.8% | -87.8% | -93.7% | 3.4 |
| MoE-Gyro | 8.29 | 0.325 | 0.92 | 24.19 | -98.0% | -94.1% | -98.4% | 1 |
Ablation Study¶
MoE Architecture Ablation:
| Model | PSNR↑ | SNR↑ | GPU Mem↓ |
|---|---|---|---|
| ORE+DE (no routing) | 8.19 | 24.58 | 129 MB |
| SingleNet (equivalent size) | 7.23 | 12.9 | 64.7 MB |
| MoE-Gyro | 8.29 | 24.19 | 71.3 MB |
GD-Attn Placement Ablation:
| Configuration | PSNR↑ | P_MSE↓ | Corr↑ |
|---|---|---|---|
| No GD-Attn | 8.08 | 0.345 | 0.87 |
| Encoder only | 8.03 | 0.345 | 0.88 |
| Enc.+Dec. | 8.27 | 0.335 | 0.90 |
| Decoder only | 8.29 | 0.324 | 0.92 |
ORE Component Ablation:
| Component Combination | PSNR↑ | P_MSE↓ | Corr↑ |
|---|---|---|---|
| None | 7.68 | 0.369 | 0.88 |
| GD-Attn only | 7.96 | 0.364 | 0.90 |
| Corr only | 7.83 | 0.354 | 0.91 |
| PINN only | 7.95 | 0.345 | 0.90 |
| All | 8.29 | 0.324 | 0.92 |
DE Weight Sharing Ablation:
| Weight Sharing | SNR↑ | GPU Mem↓ | Params↓ |
|---|---|---|---|
| No sharing | 24.51 | 116.8 MB | 27.8M |
| Encoder only | 24.35 | 76.2 MB | 17.15M |
| Enc.+Dec. | 24.19 | 64.4 MB | 13.9M |
Key Findings¶
- Range extension: At an actual angular velocity of −1731.8°/s, MoE-Gyro reconstructs −1453.7°/s from a ±450°/s clipped signal.
- Cross-device generalization: Trained on iPhone 14, zero-shot transfer to Huawei P70 and Xiaomi 14 yields favorable results (PSNR of 8.28 and 7.95, respectively).
- Cross-task generalization: Stable performance across periodic swinging, jump impact, and high-frequency torsion motion categories.
- Real-time performance: Full model processes a 2.56-second segment in 117 ms; compressed model achieves 41 ms with only 1.85M parameters.
- MoE architecture advantage: Compared to a SingleNet of equivalent size, PSNR improves by 1.06 dB and SNR by 11.29 dB.
Highlights & Insights¶
- Novel problem formulation: The first unified self-supervised framework to simultaneously address over-range reconstruction and denoising, breaking the long-standing range–noise trade-off.
- Elegant GD-Attn design: Learnable Gaussian decay replaces hard windows, enabling adaptive focus on peak regions.
- PINN physical constraints: Embeds MEMS spring-mass dynamics into the loss function to improve generalization.
- Practical FFT-guided augmentation: Physically grounded noise generation based on Allan variance standards.
- ISEBench contribution: The first open-source evaluation benchmark for IMU signal enhancement, unifying seven evaluation metrics.
Limitations & Future Work¶
- The relatively large architecture poses deployment challenges for resource-constrained embedded devices (e.g., MCUs).
- Training data is sourced from a single device (iPhone 14), despite reasonable zero-shot generalization performance.
- Validation is limited to gyroscopes and has not been extended to other IMU sensors such as accelerometers.
- The 100 Hz sampling rate constrains applicability to certain high-frequency scenarios.
- The gating routing relies on simple heuristic rules, which may lack flexibility.
Related Work & Insights¶
- Unlike HEROS-GAN (a fully supervised generative approach), MoE-Gyro is purely self-supervised.
- The MAE (Masked Autoencoder) backbone provides an effective framework for time-series signal processing.
- The MoE architecture successfully mitigates gradient conflicts inherent in multi-task learning.
- ISEBench can serve as a standard evaluation platform for future inertial signal enhancement research.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ (First unified self-supervised IMU signal enhancement framework with multiple novel designs)
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Comprehensive ablations, cross-device generalization, real-time performance, and real-scenario validation)
- Writing Quality: ⭐⭐⭐⭐ (Clear structure with detailed algorithmic descriptions)
- Value: ⭐⭐⭐⭐ (Addresses a practical engineering problem; code and benchmark are open-sourced)