📦 Model Compression¶
🤖 AAAI2026 · 60 paper notes
📌 Same area in other venues: 📷 CVPR2026 (98) · 🔬 ICLR2026 (241) · 💬 ACL2026 (59) · 🧪 ICML2026 (117) · 🧠 NeurIPS2025 (140) · 📹 ICCV2025 (52)
🔥 Top topics: Model Compression ×9 · Compression ×7 · LLM ×6 · Knowledge Distillation ×5 · Reasoning ×3
- A Closer Look at Knowledge Distillation in Spiking Neural Network Training
-
To address the overlooked distribution mismatch between teacher ANN continuous features/logits and student SNN discrete sparse spike features/logits in ANN→SNN knowledge distillation, this paper proposes the CKDSNN framework based on Saliency-scaled Activation Map Distillation (SAMD) and Noise-smoothed Logits Distillation (NLD), achieving new state-of-the-art SNN training performance on CIFAR-10/100, ImageNet-1K, and CIFAR10-DVS.
- AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization
-
To address the severe inference latency overhead (250%–950%) of dynamic MoE-LoRA adapters, this paper proposes a token-level pre-gating architecture that performs a single global routing decision at the first layer. Combined with a custom SGMM fused CUDA kernel that merges all activated LoRA adapters into the backbone in one shot, the approach reduces decoding latency by 2.4× while preserving model accuracy.
- Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency
-
This paper proposes a novel paradigm termed Asymmetric Cross-modal Knowledge Distillation (ACKD), realized through the SemBridge framework — comprising two plug-and-play modules, namely self-supervised semantic matching and optimal transport alignment — to enable cross-modal knowledge distillation under weak semantic consistency. This allows multispectral (MS) images collected from different geographic regions to effectively guide RGB-based remote sensing scene classification.
- BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?
-
This paper proposes BD-Net, which for the first time successfully integrates depth-wise convolution (DWConv) into binary neural networks (BNNs) by introducing 1.58-bit convolution and pre-BN residual connections. BD-Net achieves a new state of the art in the BNN domain on ImageNet with an extremely low computational cost of 33M OPs, with accuracy improvements of up to 9.3 percentage points across multiple datasets.
- Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning
-
This paper proposes FLAD, a framework that decomposes the sharpness-aware perturbation direction into a gradient-aligned component and a stochastic-noise component, retaining only the noise component for regularization. By combining zeroth-order and first-order sharpness, FLAD improves generalization in continual learning with minimal additional computational overhead.
- CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
-
This paper introduces the concept of "micro-expert" to decompose MoE layer outputs as cross-matrix (up/gate/down_proj) linear combinations, enabling structured pruning (Camera-P) and mixed-precision quantization (Camera-Q) based on energy ranking. On Deepseek-MoE-16B, Qwen2-57B, and Qwen3-30B at 20%–60% sparsity, the method comprehensively outperforms NAEE and D²-MoE; analysis of Qwen2-57B requires less than 5 minutes on a single A100 GPU.
- Can You Tell the Difference? Contrastive Explanations for ABox Entailments
-
This paper proposes a formal framework for Contrastive ABox Explanations (CE) to answer questions of the form "Why is \(a\) an instance of \(C\) but \(b\) is not?", simultaneously accounting for positive entailments and missing entailments within Description Logic knowledge bases, and analyzes the computational complexity under different description logics and optimization criteria.
- Compensating Distribution Drifts in Class-incremental Learning of Pre-trained Vision Transformers
-
This paper proposes Sequential Learning with Drift Compensation (SLDC), which learns latent space transformation operators (linear / weakly nonlinear) to compensate for distribution drifts induced by sequential fine-tuning of pre-trained ViTs in class-incremental learning. Combined with knowledge distillation, the approach achieves performance close to the joint-training upper bound.
- Condensed Data Expansion Using Model Inversion for Knowledge Distillation
-
This paper proposes using condensed datasets as prototypes to guide the model inversion (MI) process. A feature-alignment discriminator enforces distributional consistency between synthesized data and condensed samples, thereby expanding the condensed dataset for knowledge distillation. The method achieves up to 11.4% improvement over standard MI-based distillation on CIFAR/ImageNet.
- Consensus-Aligned Neuron Efficient Fine-Tuning Large Language Models for Multi-Domain Machine Translation
-
This paper proposes CANEFT, which uses mutual information (MI) to identify consensus-aligned neurons in LLMs that are consistently important across domains, and fine-tunes only these neurons to achieve efficient adaptation for multi-domain machine translation (MDMT). CANEFT outperforms PEFT baselines such as LoRA across 3 LLMs and 10 translation domains without introducing any additional parameters.
- CTPD: Cross Tokenizer Preference Distillation
-
This paper proposes Cross-Tokenizer Preference Distillation (CTPD), the first unified framework supporting preference distillation across heterogeneous tokenizers. Through three key innovations—Aligned Span Projection, cross-tokenizer importance weighting, and Teacher-Anchored Reference—CTPD achieves substantial improvements over existing methods on multiple benchmarks.
- Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers
-
This paper proposes a "Distillation Dynamics" analytical framework (channel-wise FFT spectral analysis + Shannon entropy + activation magnitude tracking) to reveal that ViTs exhibit a distinctive U-shaped information processing pattern (compression followed by expansion). The work demonstrates that the fundamental cause of feature-based distillation failure in ViTs is a representational paradigm mismatch between the teacher's distributed high-dimensional encoding in later layers and the student's limited channel capacity—rather than a simple capacity gap.
- Distilling Cross-Modal Knowledge via Feature Disentanglement
-
This paper proposes Frequency-Decoupled Cross-Modal Knowledge Distillation (FD-CMKD), which decomposes teacher and student features into low-frequency (modality-shared semantics) and high-frequency (modality-specific details) components via Fourier transform, applies strong-consistency MSE and weak-consistency logMSE losses respectively, and introduces scale normalization along with shared classifier alignment to bridge the feature space. FD-CMKD consistently outperforms existing distillation methods across multiple cross-modal scenarios including audio–visual, image–text, and semantic segmentation.
- Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
-
This paper proposes MergeBarrier, a plug-and-play defense method that disrupts linear mode connectivity (LMC) between a protected model and its homologous counterparts by applying orthogonal projection transformations to attention layers and activation-function-unfolding reparameterization to FFN layers, thereby actively preventing unauthorized model merging without degrading model performance.
- Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs
-
This paper proposes PUMA, a framework that leverages lightweight adapters and a grouped user selection strategy to efficiently migrate personalized soft prompts from a source LLM to a target LLM with a different architecture. PUMA matches or surpasses from-scratch training on three large-scale datasets while reducing computational cost by up to 98%.
- DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation
-
DOS is a framework that distills semantic softmaps exclusively over observable (unmasked) points, combined with Zipf-Sinkhorn regularization based on a Zipfian prior to handle the long-tail distribution of 3D semantics. It achieves state-of-the-art self-supervised learning performance on six 3D benchmarks, reaching 95% of supervised performance under linear probing.
- DP-GenG: Differentially Private Dataset Distillation Guided by DP-Generated Data
-
This paper proposes DP-GenG, a framework that leverages differentially private generated data (DP-generated data) to guide three stages of dataset distillation — initialization, feature matching, and expert calibration — significantly improving the utility and privacy protection of the distilled dataset under a limited privacy budget.
- DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression
-
To address the deployment inefficiency of learned image compression (LIC) models, this paper proposes DynaQuant, a framework that achieves content-adaptive quantization at the parameter level via learnable scale/zero-point combined with a Distance-Aware Gradient Modulator, and dynamically assigns optimal bit-widths per layer at the architecture level via a lightweight Bit-Width Selector. Across three baselines (Cheng2020, ELIC, Ballé), the framework achieves near-FP32 R-D performance while delivering up to 5.17× speedup and reducing model size to approximately 1/4 of the original.
- Earth-Adapter: Bridge Geospatial Domain Gaps with Mixture of Frequency Adaptation
-
This paper proposes Earth-Adapter, the first parameter-efficient fine-tuning (PEFT) method specifically designed to address artifact problems in remote sensing imagery. Through a frequency-guided Mixture of Adapters (MoA), features are decomposed into high- and low-frequency subspaces, independently optimized, and then dynamically aggregated. The method outperforms the baseline Rein across three settings: remote sensing semantic segmentation (SS), domain adaptation (DA), and domain generalization (DG).
- EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training
-
This paper proposes EEG-DLite, a dataset distillation framework that combines self-supervised encoding, outlier filtering, and diversity sampling to compress a 2,500-hour EEG dataset to just 5% of its original size, achieving performance comparable to or exceeding full-data pretraining while reducing GPU pretraining time from 30 hours to 2 hours.
- Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression
-
This paper proposes CGRS (Certainty-Guided Reflection Suppression), a training-free efficient reasoning method that dynamically suppresses reflection trigger tokens (e.g., "Wait", "But") when the model exhibits high confidence, reducing token consumption of large reasoning language models by 18.5%–41.9% while maintaining reasoning accuracy.
- EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning in Vision Transformers
-
This paper proposes EfficientFSL, a query-only parameter-efficient fine-tuning framework for ViT-based few-shot classification. Through three components — the Forward Block (decoupled active/frozen sub-blocks), the Combine Block (adaptive multi-layer feature fusion), and the SQ Attention Block (support-query distribution alignment) — EfficientFSL achieves state-of-the-art performance on 4 in-domain and 6 cross-domain benchmarks using only 1.25M–2.48M trainable parameters.
- Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework
-
This paper proposes a staged inference + dual-knowledge infusion framework that decomposes radiology report error correction into three phases—detection → localization → correction—and integrates Medical Knowledge Graph Distillation (MKGD) with External Knowledge Retrieval (EXKR) to achieve up to 31.56% improvement in error detection accuracy and 37.4% reduction in processing time across 6 LLM architectures.
- Explore and Establish Synergistic Effects between Weight Pruning and Coreset Selection
-
This paper presents the first systematic investigation of the interaction between weight pruning and coreset selection, proposing the SWaST mechanism to alternately perform both operations and establish synergistic effects, while introducing a state preservation mechanism to address the "dual loss" problem, achieving up to 17.83% accuracy improvement under 10%–90% FLOPs reduction.
- Failures to Surface Harmful Contents in Video Large Language Models
-
This paper presents the first systematic security analysis of VideoLLMs, identifying three structural design flaws — sparse temporal sampling, spatial token downsampling, and modality fusion imbalance — that cause clearly visible harmful content in videos to be omitted from model-generated textual summaries (omission rate exceeding 90%). Three zero-query black-box attacks are designed to empirically validate the severity of these vulnerabilities.
- First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
-
This paper identifies a critical yet overlooked issue in LLM post-training quantization: the column-wise compensation process renders first-order gradient terms non-negligible. The proposed FOEM method incorporates first-order terms into the error compensation formula, reducing the perplexity of Llama3-8B under 3-bit quantization by 17.3% with virtually no additional computational overhead.
- From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging
-
This paper proposes ReACT, which shifts controllable model merging from parameter-space optimization to representation-space correction. By deriving a closed-form solution, ReACT enables instant generation of Pareto-optimal models under arbitrary user preferences, achieving 36–208× speedup over existing methods while delivering superior performance.
- From Passive Perception to Active Memory: A Weakly Supervised Image Manipulation Localization Framework Driven by Coarse-Grained Annotations
-
This paper proposes BoxPromptIML, a weakly supervised image manipulation localization (IML) framework based on coarse-grained bounding box annotations. It leverages a frozen SAM teacher model to convert rough bounding boxes into high-quality pseudo-masks, and trains a lightweight student model via a memory-guided gated fusion module (MGFM), achieving performance comparable to or surpassing fully supervised methods with an annotation cost of only 7 seconds per image.
- Group Orthogonal Low-Rank Adaptation for RGB-T Tracking
-
This paper proposes the GOLA framework, which quantifies LoRA rank importance via SVD decomposition, freezes critical ranks to preserve pre-trained priors, clusters redundant ranks into groups, and imposes inter-group orthogonal constraints to enable more efficient RGB-T tracking adaptation.
- HCF: Hierarchical Cascade Framework for Distributed Multi-Stage Image Compression
-
This paper proposes the HCF framework, which performs cross-node transformation directly in the latent space (avoiding pixel-domain recompression) and introduces policy-driven quantization control to achieve up to 12.64% BD-Rate PSNR improvement in distributed multi-stage image compression, while reducing FLOPs by up to 97.8% and GPU memory by up to 96.5%.
- InfoCom: Kilobyte-Scale Communication-Efficient Collaborative Perception with Information-Aware Feature Compression
-
This paper proposes InfoCom, a framework that applies an extended information bottleneck (IB) principle to compress the communication payload of collaborative perception from the MB scale to the KB scale—a 440× reduction compared to Where2comm—while maintaining near-lossless perception performance. The framework consists of three core modules: information-aware encoding, sparse mask generation, and multi-scale decoding.
- KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache
-
This paper proposes KVmix, which evaluates the importance of each layer's KV Cache by computing the \(L_2\) norm of gradients with respect to Key/Value projection weights, enabling layer-wise mixed-precision quantization (Key avg. 2.19-bit, Value avg. 2.38-bit). Combined with a dynamic Recent Pivotal Context (RPC) selection strategy, KVmix achieves near-lossless inference, 4.9× memory compression, and 5.3× throughput acceleration on models such as Llama and Mistral.
- LexChronos: An Agentic Framework for Structured Event Timeline Extraction in Indian Jurisprudence
-
This paper proposes LexChronos, a dual-agent iterative framework for extracting structured event timelines from Indian Supreme Court judgments. A LoRA fine-tuned extraction agent identifies candidate events, while a pretrained feedback agent scores and refines them through a confidence-driven loop. The system achieves a BERT F1 of 0.8751 on a synthetic dataset, and the structured timelines are preferred by GPT-4 over unstructured baselines in 75% of downstream legal summarization cases.
- Lightweight Optimal-Transport Harmonization on Edge Devices
-
This paper proposes MKL-Harmonizer, which leverages the Monge-Kantorovich Linear (MKL) mapping from classical optimal transport theory to train a compact encoder that predicts 12-dimensional color transformation parameters, enabling real-time image color harmonization on edge devices. The method achieves state-of-the-art performance on the combined perceptual quality–speed metric in AR scenarios.
- LOOM: Personalized Learning Informed by Daily LLM Conversations Toward Long-Term Mastery via a Dynamic Learner Memory Graph
-
This paper proposes LOOM, an agentic pipeline system that observes users' daily LLM conversations, infers learning needs, maintains a Dynamic Learner Memory Graph, and automatically generates personalized mini-courses. LOOM unifies continuity (long-term progress tracking) and initiative (immediate responsiveness to emerging interests) in a single framework.
- MetaGDPO: Alleviating Catastrophic Forgetting with Metacognitive Knowledge through Group Direct Preference Optimization
-
This paper proposes MetaGDPO, which addresses catastrophic forgetting in reasoning distillation for small models (<8B) from two complementary perspectives: (1) the data side, constructing a 5K dataset (MetaKL) based on metacognitive knowledge annotation; and (2) the training side, introducing GDPO—a DPO variant that replaces GRPO's online sampling with offline response groups generated by a large model.
- Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy
-
This paper proposes PaGKD, a pairing-free group-level knowledge distillation framework that eliminates the dependency on paired data in conventional NBI→WLI cross-modal distillation. It introduces group-level prototype distillation (GKD-Pro, which extracts modality-invariant semantic prototypes via a shared lesion query Transformer) and group-level dense distillation (GKD-Den, which achieves dense spatial alignment through activation map-guided semantic relation cross-attention). PaGKD improves AUC by 3.3%/1.1%/2.8%/3.2% across four clinical datasets.
- Parametric Pareto Set Learning for Expensive Multi-Objective Optimization
-
This paper proposes the PPSL-MOBO framework, which employs a hypernetwork + LoRA architecture to learn a unified mapping from preference vectors and extrinsic parameters to Pareto-optimal solutions. Combined with Gaussian process surrogate models and hypervolume improvement acquisition strategies, the framework efficiently addresses expensive parametric multi-objective optimization problems.
- PocketLLM: Ultimate Compression of Large Language Models via Meta Networks
-
PocketLLM proposes compressing LLM weight vectors in a latent space via meta networks (encoder–codebook–decoder), replacing the original weight matrices with a small decoder, a compact codebook, and index arrays. The method achieves 10× compression on Llama 2-7B with negligible accuracy degradation, breaking the accuracy bottleneck of traditional quantization and pruning approaches under extreme compression ratios.
- Post Training Quantization for Efficient Dataset Condensation
-
This work is the first to apply post-training quantization (PTQ) to dataset distillation, proposing a patch-based quantization framework (PAQ + grouping + refinement) that nearly doubles test accuracy of distilled datasets at the extreme 2-bit regime (e.g., DM IPC=1 improves from 26.0% to 54.1%). The framework is plug-and-play and can be applied to various distillation methods.
- Predicting the Future by Retrieving the Past
-
This paper proposes PFRP (Predicting the Future by Retrieving the Past), which constructs a Global Memory Bank (GMB) to store historical patterns, trains an encoder via Predictive Contrastive Learning (PCL) for efficient retrieval, and dynamically integrates retrieved global predictions with any local forecasting model. PFRP achieves an average improvement of 8.4% in forecasting performance across 7 datasets.
- Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval
-
This paper proposes PSCA, a two-stage framework that establishes class-level semantic connections via orthogonal prototypes, dynamically corrects pseudo-label reliability through geometric-semantic consistency alignment, and learns hash codes on reconstructed features, achieving substantial improvements over existing methods on multiple cross-domain retrieval benchmarks.
- Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge
-
This paper proposes LoRA-Null, which initializes LoRA within the null space of pre-trained input activations (rather than the null space of weights). From an information-theoretic perspective, the effective rank of activations is much lower than that of weights, meaning their null space encodes less pre-trained knowledge, thereby substantially mitigating catastrophic forgetting during fine-tuning.
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution
-
This paper proposes QuantVSR, the first low-bit (4/6-bit) post-training quantization framework for diffusion-based video super-resolution (VSR). It introduces a Spatiotemporal Complexity-Aware (STCA) mechanism for layer-adaptive rank allocation and a Learnable Bias Alignment (LBA) module to mitigate quantization bias. Under the 4-bit setting, QuantVSR achieves 84.39% parameter compression and 82.56% computation reduction while maintaining performance comparable to the full-precision model.
- QuEPT: Quantized Elastic Precision Transformers with One-Shot Calibration for Multi-Bit Switching
-
QuEPT is an elastic precision quantization framework that enables real-time switching among arbitrary predefined bit-widths on ViT/LLM/MLLM after a single calibration pass, via two core modules—Multi-Bit Token Merging and Multi-Bit Cascaded LoRA—achieving performance on par with or exceeding single-bit-width SOTA PTQ methods.
- Reinforced Rate Control for Neural Video Compression via Inter-Frame Rate-Distortion Awareness
-
This paper proposes the first reinforcement learning rate control framework based on Constrained Markov Decision Processes (CMDP), which jointly captures intra-frame content features and inter-frame rate-distortion coupling dependencies via spatiotemporal state modeling, and directly maps these to per-frame coding parameters. The approach reduces the average bitrate error to 1.20% and achieves BD-Rate savings of up to 13.98% across multiple neural video codecs.
- Renormalization Group Guided Tensor Network Structure Search
-
This paper proposes RGTN, a framework that introduces Renormalization Group (RG) theory from statistical physics into tensor network structure search. Through a multi-scale coarse-graining–expansion–compression pipeline and learnable edge gating, RGTN enables continuous topological evolution, achieving state-of-the-art compression ratios on light field compression, high-order tensor decomposition, and video completion tasks, while running 4–600× faster than existing methods.
- Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
-
This paper proposes the first uni-level dataset distillation framework for long-tailed distributions. Through three core strategies — expert model debiasing, fair BN statistics calibration, and confidence-guided initialization — the method achieves +15.6% on CIFAR-100-LT and +11.8% on Tiny-ImageNet-LT, comprehensively outperforming DAMED.
- Satisficing and Optimal Generalised Planning via Goal Regression (Extended Version)
-
This paper presents the Moose planner, which synthesises generalised planning programs from training problems via goal regression. It decomposes multi-goal problems into single-goal subproblems, solves each optimally, and applies regression followed by lifting to produce a set of first-order condition-action rules. These rules support either satisficing planning (direct rule execution) or optimal planning (encoded as axioms to prune the search space).
- Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
-
SharpV proposes a two-stage training-free visual token pruning framework. In the Pre-LLM stage, it adaptively adjusts the pruning ratio per frame based on spatiotemporal information; in the Intra-LLM stage, it prunes the KV Cache based on a visual information degradation hypothesis. SharpV is the first method to achieve full compatibility with Flash Attention, retaining approximately 12% of tokens while matching or surpassing dense model performance across multiple video understanding benchmarks.
- SIGN: Schema-Induced Games for Naming
-
SIGN introduces lightweight message Schemas (e.g.,
@say {name: Ck}) into LLM multi-agent naming games, demonstrating that structured priors can improve group convention agreement by up to 5.8×, reduce convergence token cost by an order of magnitude, and provide a simple, controllable "tuning knob" for efficient multi-agent coordination. - SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping
-
SkipCat proposes a rank-maximized low-rank compression framework that introduces two techniques—intra-layer shared projection (Cat) and block skipping (Skip)—to retain more effective rank under the same compression ratio. Without any fine-tuning, it achieves up to 7% accuracy improvement on zero-shot tasks over existing low-rank methods.
- SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
-
SpecQuant proposes a two-stage quantization framework based on adaptive Fourier-domain decomposition: it first smoothly migrates activation outliers into weights, then suppresses high-frequency noise in the weights via channel-wise low-frequency Fourier truncation. On LLaMA-3 8B, W4A4 quantization achieves only 1.5% accuracy degradation, while delivering 2× speedup and 3× memory savings.
- Steering Pretrained Drafters during Speculative Decoding
-
This paper proposes SD², which extracts steering vectors from verifier hidden states and injects them into the MLP layers of a pretrained drafter, achieving dynamic drafter–verifier alignment in speculative decoding. Under standard sampling, the number of accepted tokens increases by up to 35% with negligible computational overhead.
- StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs Through Knowledge-Reasoning Fusion
-
This paper proposes the ThinkingF pipeline, which enhances LLMs' formal language domain knowledge via large-scale knowledge distillation and their informal-to-formal reasoning ability via template-guided reasoning trajectory synthesis. These two capabilities are then integrated through a two-stage SFT followed by RLVR. The resulting 7B/32B models achieve state-of-the-art performance on FormalMATH-Lite and ProverBench.
- Stratified Knowledge-Density Super-Network for Scalable Vision Transformers
-
This paper proposes transforming a pretrained ViT into a "Stratified Knowledge-Density Super-Network" (SKD Super-Network) via two steps—WPAC (Weighted PCA Attention Contraction) and PIAD (Progressive Importance-Aware Dropout)—to hierarchically organize knowledge within the pretrained weights, enabling subnetwork extraction of arbitrary size at O(1) cost without additional fine-tuning, achieving performance on par with or surpassing state-of-the-art compression methods.
- TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
-
This paper proposes TGDD, which reframes static distribution matching as a dynamic alignment process along training trajectories. It captures evolving semantics via Stage-wise Distribution Matching and reduces inter-class overlap via Stage-wise Distribution Constraint, achieving SOTA on 10 datasets with a 5.0% accuracy gain on high-resolution benchmarks.
- Towards Test-time Efficient Visual Place Recognition via Asymmetric Query Processing
-
This paper proposes AsymVPR, an efficient asymmetric framework for Visual Place Recognition (VPR), which replaces expensive k-NN precomputation with a Geographical Memory Bank and bridges the capacity gap between a lightweight query network and a high-capacity gallery network via Implicit Embedding Augmentation, achieving retrieval performance close to the full-size model using only ~8% of its FLOPs.
- XLinear: A Lightweight and Accurate MLP-Based Model for Long-Term Time Series Forecasting with Exogenous Inputs
-
This paper proposes XLinear, a lightweight time series forecasting model based on MLP with sigmoid gating. Through a global token mechanism, it efficiently integrates endogenous and exogenous variable information, achieving an optimal accuracy–efficiency trade-off across 12 datasets.
- Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated
-
A lightweight post-hoc calibration framework grounded in Bayesian decision theory is proposed. By adding a learnable scalar offset α to the output logits of an existing detector, the method significantly improves detection accuracy under distribution shift without any retraining.