Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression¶
Conference: CVPR 2025
arXiv: 2502.16638
Code: —
Area: Optimization / Model Compression
Keywords: structured pruning, quantization-aware training, joint optimization, dependency graph, QADG
TL;DR¶
Proposed the GETA framework to achieve automatic joint structured pruning and quantization-aware training: Quantization-Aware Dependency Graph (QADG) constructs a generic pruning search space + partially projected SGD guarantees layer-wise bit-width constraints + an interpretable joint learning strategy, achieving competitive or state-of-the-art compression performance on both CNNs and Transformers.
Background & Motivation¶
Background: Structured pruning and quantization are two fundamental DNN compression techniques, usually applied independently. Co-optimization has the potential to yield smaller, higher-quality models.
Limitations of Prior Work: - Engineering Difficulty: Existing joint schemes have complex workflows involving multiple stages (such as pruning followed by quantization, alternating optimization, etc.). - Black-box Optimization: A large amount of hyperparameter tuning is required to control the overall compression rate (such as searching for the pruning rate and bit-width of each layer). - Insufficient Architectural Generalization: Most methods are only applicable to specific network architectures (e.g., CNNs only) and cannot automatically handle arbitrary DNNs.
Key Challenge: Pruning changes the network structure (number of channels) while quantization changes numerical precision (bit-width). The interaction between them is complex, and independently optimizing their respective hyperparameters is already NP-hard, making joint optimization even more challenging.
Key Insight: - Use QADG to automatically construct the pruning search space for arbitrary quantization-aware networks. - Use partially projected SGD to transform discrete bit-width constraints into a continuous optimization problem. - Replace black-box search with white-box optimization.
Core Idea: QADG unified search space + projected SGD constraint satisfaction + interpretable pruning-quantization relationship = one-click joint compression.
Method¶
Overall Architecture¶
GETA takes an arbitrary DNN and a target compression rate as input, and outputs the jointly pruned and quantized model: 1. QADG analyzes the network structure and constructs the pruning search space. 2. Jointly optimize the pruning rate and bit-width of each layer within the search space. 3. Partially projected SGD ensures constraint satisfaction. 4. One-shot training, without post-processing.
Key Designs¶
-
Quantization-Aware Dependency Graph (QADG)
- Function: Automatically constructs the structured pruning search space for arbitrary quantization-aware DNNs.
- Mechanism:
- Extends traditional dependency graphs to consider quantization operations (such as fake quantization nodes).
- Automatically identifies prunable channel groups and their dependencies.
- Handles complex topologies such as skip connections and multi-branch structures.
- Novelty: Architecture-agnostic, capable of handling arbitrary structures including CNNs, Transformers, and hybrid architectures.
- Implementation: Based on static analysis of the computation graph.
-
Partially Projected Stochastic Gradient Descent (Partially Projected SGD)
- Function: Guarantees that layer-wise bit-width constraints are always satisfied during training.
- Mechanism:
- Relaxes discrete bit-width constraints \(b_l \in \{2, 4, 8, ...\}\) into continuous variables.
- Projects onto the constraint set after each gradient update step.
- Alternatingly updates weight parameters, bit-widths, and pruning rates.
- Mathematical Guarantee: Converges to a stationary point within the constraint-feasible region.
- Novelty: No outer-loop search (such as NAS, reinforcement learning) is required, ensuring white-box interpretability.
-
Joint Learning Strategy
- Function: Establishes an interpretable relationship between pruning and quantization.
- Mechanism:
- Channel reduction after pruning \(\rightarrow\) Allows higher-precision quantization for the same layer.
- Decreased precision after quantization \(\rightarrow\) Requires retaining more channels to compensate.
- Automatically balances both using the Lagrangian multiplier method.
- Key Insight: There is a complementary relationship between pruning rates and bit-widths.
- Implementation: The joint optimization objective function includes both accuracy loss and compression rate constraints.
Loss & Training¶
- End-to-end one-shot training without multiple stages.
- No need for the traditional pretraining-pruning-finetuning pipeline.
- Supports both training from scratch and starting from pretrained models.
Key Experimental Results¶
ResNet-18 / ImageNet¶
| Method | Top-1 Acc↑ | FLOPs↓ | Description |
|---|---|---|---|
| Pruning-only baseline | Ref | Ref | Independent pruning |
| Quantization-only baseline | Ref | Ref | Independent quantization |
| Joint baseline | Ref | Ref | Two-stage |
| GETA | Competitive/Best | Higher compression rate | One-stage joint |
Transformer Architectures (ViT / DeiT)¶
| Method | Top-1 Acc↑ | Compression Rate | Feature |
|---|---|---|---|
| Independent pruning | Baseline | Medium | Pruning only |
| Independent quantization | Baseline | Medium | Quantization only |
| GETA | Higher | Higher | Joint optimization |
Ablation Study¶
| Component | Impact on Accuracy |
|---|---|
| w/o QADG (Manual search space) | Accuracy drops, and fails to generalize |
| w/o Projected SGD (Unconstrained) | Constraint violation, bit-width uncontrollable |
| w/o Joint strategy (Independent optimization) | Worsened compression-accuracy trade-off |
| Full GETA | Optimal trade-off |
Key Findings¶
- Joint optimization consistently outperforms simple combinations of independent optimizations.
- Automation by QADG eliminates the need to manually design search spaces.
- Projected SGD ensures that constraints are never violated during the training process.
- Validated on both CNNs and Transformers, demonstrating architecture-agnostic capability.
Highlights & Insights¶
- Fully Automated: No manual design of pruning rates/bit-widths per layer is required.
- White-box Optimization: Offers high interpretability compared to black-box search methods like NAS.
- One-shot Training: Eliminates the engineering complexity of multi-stage pipelines.
- Architecture Agnostic: QADG automatically handles arbitrary network topologies.
Limitations & Future Work¶
- The accuracy drop is still relatively large under extreme compression rates.
- QADG construction relies on manual static graph analysis, offering limited support for dynamic graphs.
- Currently only validated on classification tasks; downstream tasks like detection/segmentation remain to be verified.
Rating¶
- Novelty: ⭐⭐⭐⭐ Novel combination of QADG + Projected SGD + Joint Strategy
- Experimental Thoroughness: ⭐⭐⭐⭐ Validated on both CNN and Transformer architectures
- Writing Quality: ⭐⭐⭐⭐ Clear theoretical derivation
- Value: ⭐⭐⭐⭐ Practical significance for model deployment