AAAI 2026 Model Compression Image compression quantization mixed-precision dynamic bit-width allocation quantization-aware training learned image compression

DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression¶

Conference: AAAI 2026 arXiv: 2511.07903 Code: https://github.com/baoyu2020/DynaQuant Area: Model Compression Keywords: Image compression quantization, mixed-precision, dynamic bit-width allocation, quantization-aware training, learned image compression

TL;DR¶

To address the deployment inefficiency of learned image compression (LIC) models, this paper proposes DynaQuant, a framework that achieves content-adaptive quantization at the parameter level via learnable scale/zero-point combined with a Distance-Aware Gradient Modulator, and dynamically assigns optimal bit-widths per layer at the architecture level via a lightweight Bit-Width Selector. Across three baselines (Cheng2020, ELIC, Ballé), the framework achieves near-FP32 R-D performance while delivering up to 5.17× speedup and reducing model size to approximately 1/4 of the original.

Background & Motivation¶

LIC models such as ELIC and Cheng2020 have surpassed traditional codecs like VVC in R-D performance, but their computational complexity and memory demands make deployment on edge devices (e.g., mobile phones, drones) highly challenging. Existing quantization methods exhibit two critical limitations: (1) they apply a globally uniform bit-width (e.g., full INT8), ignoring the large variation in sensitivity to quantization noise across different layers of LIC models; and (2) quantization parameters (scale, zero-point) are statically fixed, unable to adapt to the highly input-dependent latent feature distributions in LIC. This results in either over-conservative treatment of robust layers (wasting compute) or over-aggressive treatment of sensitive layers (degrading R-D performance).

Core Problem¶

How to design a two-level dynamic quantization strategy for LIC models: (1) at the parameter level — quantization parameters that adapt to input content; and (2) at the architecture level — bit-widths that are dynamically allocated per layer according to layer sensitivity and data characteristics? Additionally, how to address the training difficulty caused by the non-differentiable rounding operation in quantization?

Method¶

Overall Architecture¶

DynaQuant comprises two complementary modules: Dynamic Parameter Adaptation (DPA) for parameter-level adaptation, and Dynamic Bit-Width Selector (DBWS) for layer-level bit-width allocation. Both are embedded within a standard QAT pipeline and jointly optimized end-to-end. The hyperencoder is fixed at 8-bit quantization, as it contains few parameters and is sensitive to quantization, making dynamic allocation both marginally beneficial and costly.

Key Designs¶

Content-Aware Quantization Mapping: The static scale \(s\) and zero-point \(z\) in conventional QAT are replaced by learnable per-channel parameters, optimized end-to-end via backpropagation through the R-D loss. This allows the quantization mapping to adapt to the latent feature distribution of each input image.
Distance-Aware Gradient Modulator (DGM): To address the limitation of the Straight-Through Estimator (STE), which crudely approximates the rounding gradient as a constant 1, a new gradient surrogate function is proposed: \(g(x) = \frac{1}{2} \cdot \frac{\tanh(\beta(x - \lfloor x \rfloor) - 0.5)}{\tanh(0.5)} + 0.5\). Its gradient varies with the distance of the input to the nearest quantization boundary (half-integer, e.g., 0.5): values near the boundary receive larger gradients (emphasizing the need for further optimization), while values near quantization centers receive smaller gradients (indicating stability), providing a more precise optimization signal than STE.
Dynamic Bit-Width Selector (DBWS): A lightweight network module that takes an input activation tensor \(A \in \mathbb{R}^{C \times H \times W}\), processes it through AdaptivePool (output 5×5) → Flatten → two-layer MLP (with Dropout \(p=0.2\)) → Reshape → Gumbel-Softmax (soft sampling during training, argmax hard selection during inference), and outputs a probability distribution over the candidate bit-width set \(\mathcal{B} = \{b_1, b_2, ..., b_M\}\) for each layer. The encoder and decoder each have independent DBWS modules with symmetric architectures, ensuring consistent bit-width strategies between encoder and decoder — without requiring transmission of additional bit-width configuration metadata.

Loss & Training¶

The joint optimization loss is: \(\mathcal{L} = R + \lambda D + \gamma \mathcal{L}_{\text{bits}}\)

\(R\): entropy model estimated bitrate of the quantized latent
\(D\): reconstruction distortion (MSE or MS-SSIM)
\(\mathcal{L}_{\text{bits}} = \frac{1}{L} \sum_{l=1}^{L} \sum_{k=1}^{M} (p_l)_k \cdot b_k\): expected average bit-width across all dynamically quantized layers; \(\gamma\) controls the trade-off between R-D performance and computational efficiency

DBWS input strategy: the first module of each encoder/decoder is fixed at 8-bit quantization, and its output serves as the input to the corresponding DBWS; all subsequent modules (2nd through \(BL\)-th) use the adaptive bit-widths output by DBWS.

Key Experimental Results¶

Table 1 Main Results (BD-Rate loss % / Speedup / Model size):

Model	Method	Kodak BD-Rate	Avg. BD-Rate	Speedup	Model Size
Cheng	FP32 Baseline	0.00%	0.00%	1.00×	45.08 MB
Cheng	FMPQ	0.89%	1.30%	4.00×	~11.27 MB
Cheng	RDO-PTQ	4.88%	4.88%	4.00×	~11.27 MB
Cheng	Q-Cheng (DPA)	1.02%	1.60%	4.00×	11.27 MB
Cheng	DQ-Cheng (DPA+DBWS)	7.15%	12.18%	5.17×	8.72 MB
ELIC	FP32 Baseline	0.00%	0.00%	1.00×	137.11 MB
ELIC	Q-ELIC	5.97%	4.92%	4.00×	34.28 MB
ELIC	DQ-ELIC	7.62%	6.39%	4.61×	29.78 MB
Ballé	FP32 Baseline	0.00%	0.00%	1.00×	19.37 MB
Ballé	FMPQ	6.48%	7.50%	~3.98×	~4.87 MB
Ballé	Q-Ballé	5.85%	5.01%	4.00×	4.84 MB
Ballé	DQ-Ballé	7.63%	6.84%	4.55×	4.26 MB

Key observations: Q- (fixed 8-bit DPA) achieves only 1.60% BD-Rate loss on Cheng, outperforming RDO-PTQ (4.88%) and approaching FMPQ (1.30%); DQ- variants trade a modest increase in BD-Rate for an additional ~1.2× speedup.

Ablation Study¶

Table 2 General Ablation (Cheng2020 q=6, Kodak): - DPA INT8: bpp=0.828, PSNR=36.649, R-D loss=1.56 (outperforms PAMS: 36.185/1.64) - DPA-DQ: avg. 6.42-bit, PSNR=36.636, R-D loss=1.57 (reduces average bit-width by 25% with negligible quality loss) - PAMS-DQ: avg. 6.85-bit, PSNR=30.262, R-D loss=4.28 → DPA and DBWS exhibit synergistic effects (not simply additive)

Table 3 DPA Component Ablation (removing any component degrades performance): - Remove learnable \(s\): PSNR 36.649 → 36.185 (−0.464 dB) - Remove learnable \(z\): PSNR → 36.323 (−0.326 dB) - Remove DGM gradient modulation \(g(x)\): PSNR → 36.288 (−0.361 dB) - All three components are important; scale \(s\) has the largest impact

Table 4 DBWS Candidate Bit-Width Set Ablation: - {4,6,8}: avg. 5.47-bit, PSNR=36.432 - {6,8,10}: avg. 6.42-bit, PSNR=36.636 (better efficiency–fidelity trade-off)

Bit-Width Allocation Visualization (Fig. 6): The texture-rich Kodim14 is assigned 10-bit in the gs-1 layer (exceeding the 8-bit assigned to other images); boundary layers (ga-0, ga-6, gs-1) tend toward higher precision while intermediate layers use lower precision — confirming that bottleneck layers genuinely require more bits.

Highlights & Insights¶

Clear two-level dynamic design: The parameter-level and architecture-level components are both independent and complementary, forming a complete adaptive quantization pipeline
Theoretically motivated DGM: Rather than simply replacing STE, DGM is designed based on the intuition of "distance to decision boundary," enabling more targeted optimization of quantization parameters
Symmetric DBWS for encoder/decoder: Eliminates the need to transmit additional bit-width configuration metadata, improving practical deployability
Convincing synergy experiment: Table 2 clearly demonstrates that DPA + DBWS yields greater benefit than the sum of each component individually
Cross-architecture generalization: Effective across three structurally distinct LIC models — Cheng2020, ELIC, and Ballé

Limitations & Future Work¶

Significant BD-Rate degradation in DQ mode: DQ-Cheng reaches a BD-Rate loss as high as 16.52% on JPEG-AI, indicating that dynamic bit-width allocation still incurs noticeable quality degradation on certain datasets/content types, with speedup coming at a non-trivial quality cost
Validation limited to three relatively dated LIC baselines: Cheng2020 (2020), Ballé (2018), and ELIC (2022) are not the latest state-of-the-art; architectures such as MambaIC are not evaluated
Insufficient justification for fixing the hyperencoder at 8-bit: The decision is attributed to empirical observation without quantitative sensitivity analysis
Candidate bit-width sets must be manually specified: Presets of {4,6,8} or {6,8,10} are used without exploring adaptive determination of candidate sets
No comparison with recent general PTQ/QAT methods (e.g., LIC-adapted variants of GPTQ, AWQ, etc.)
Latency overhead introduced by DBWS is not explicitly reported: Although claimed to be lightweight, the concrete overhead is not clearly quantified

vs. FMPQ: DynaQuant's fixed-precision mode (Q-Cheng) is competitive with FMPQ (1.60% vs. 1.30%), while additionally providing dynamic bit-width capability for greater flexibility
vs. RDO-PTQ: RDO-PTQ requires no retraining but incurs larger BD-Rate loss (4.88% vs. 1.60%); DynaQuant's QAT approach is clearly superior
vs. RAQ: RAQ achieves a BD-Rate loss of 27.84% on Kodak, rendering it essentially unusable
vs. General mixed-precision quantization (HAQ/HAWQ, etc.): These methods search for bit-width allocation via reinforcement learning or Hessian information; DynaQuant instead learns the allocation end-to-end with a lightweight MLP + Gumbel-Softmax, avoiding costly search procedures
vs. Instance-aware quantization (InstAQ, etc.): DynaQuant is similarly content-adaptive, but adds the additional dimension of per-layer dynamic bit-width allocation

The DGM gradient modulation idea is transferable to other scenarios requiring differentiable rounding (e.g., codebook learning in VQ-VAE, quantization modules in neural codecs). The "differentiable discrete selection via Gumbel-Softmax" paradigm in DBWS is common in NAS and dynamic networks, but its application to bit-width allocation in LIC represents a novel combination. The approach is conceptually analogous to token pruning in model compression — both allocate varying computational resources to different components, here along the precision dimension rather than the quantity dimension. Incorporating semantic region information (e.g., ROI) could potentially enable finer-grained spatially adaptive quantization.

Rating¶

Novelty: ⭐⭐⭐☆☆ — DGM and DBWS are not individually novel concepts (drawing from QuantSR and Gumbel-Softmax in NAS, respectively), but their integration into a unified framework targeting LIC with demonstrated synergistic effects constitutes a meaningful contribution
Experimental Thoroughness: ⭐⭐⭐⭐☆ — Ablation study design is comprehensive (separate ablations for general, DPA, and DBWS components) with generalization validated across three baselines; however, the baselines are somewhat dated and comparisons with recent quantization methods are lacking
Writing Quality: ⭐⭐⭐⭐☆ — Structure is clear, method description is complete, and figures are high quality (particularly the bit-width visualization in Fig. 6); the future work discussion in the Conclusion is overly brief
Value: ⭐⭐⭐☆☆ — Practically valuable for LIC deployment (5× speedup is significant), but the substantial R-D loss in DQ mode limits practical applicability; the transferability of core technical components is moderate