Frequency Switching Mechanism for Parameter-Efficient Multi-Task Learning¶

Conference: CVPR 2026 arXiv: 2603.21111 Code: https://casperliuliuliu.github.io/projects/Free-Sinewich Area: Multi-Task Learning / Parameter-Efficient Fine-Tuning Keywords: Parameter-efficient fine-tuning, multi-task learning, frequency switching, sinusoidal transformation, LoRA

TL;DR¶

Free Sinewich proposes a parameter-efficient multi-task learning framework based on frequency switching. By applying task-specific sinusoidal transformations \(M_t = \sin(\omega_t \cdot M_{AWB})\) to a shared low-rank base matrix, the method achieves genuine parameter reuse and task specialization at near-zero cost, attaining state-of-the-art performance on dense prediction benchmarks with the fewest trainable parameters.

Background & Motivation¶

Background: Multi-task learning (MTL) requires a single model to handle multiple tasks simultaneously. Parameter-efficient fine-tuning (PEFT) methods such as LoRA have succeeded in single-task adaptation. Recent PEFT-MTL approaches—including MTLoRA, DiTASK, and TADFormer—balance sharing and specialization through combinations of task-agnostic/task-specific adapters, SVD-based transformations, or dynamic task filters.
Limitations of Prior Work: Although existing PEFT-MTL methods claim parameter sharing, they essentially route information through auxiliary adapters into separate pathways, forming "pseudo-sharing" in which each task still maintains an independent parameter set. The absence of genuine parameter reuse prevents the model from fully exploiting cross-task common knowledge, resulting in redundant computation and insufficient generalization.
Key Challenge: How can the same set of shared weights exhibit different behaviors across different tasks while preserving parameter efficiency?
Goal: Achieve true reuse of a single parameter set across multiple tasks, rather than assigning independent parameters to each task.
Key Insight: Inspired by neuroscience—the thalamo-cortical system achieves selective communication through oscillatory multiplexing, whereby the same neural population executes different functions by switching oscillatory frequencies, effectively reusing the same "hardware." By analogy to deep networks: can task-specific functionality be achieved by switching the frequency response of the same weights?
Core Idea: Apply a sinusoidal transformation parameterized by a task-specific frequency \(\omega_t\) to a shared low-rank base matrix, so that the same parameters produce different task-specialized weights at different frequencies.

Method¶

Overall Architecture¶

The backbone is a Swin Transformer Tiny encoder. Learnable task tokens are prepended to the image patch tokens. Within each encoder stage, the first \(N-1\) blocks use a Task-Agnostic Module (standard LoRA) for generic feature extraction; the final block uses a Task-Specific Module (incorporating the frequency switching mechanism) for task-specific feature extraction. A lightweight Clock Net generates task frequencies from the task tokens; Sine-AWB modulates the shared base matrix with these frequencies to produce task-specialized weights. The decoder can also be shared via frequency switching.

Key Designs¶

Sine-AWB (Sinusoidal Adaptive Weighted Base):
- Function: Constructs an enhanced version of the low-rank matrix that improves effective rank and achieves task specialization through sinusoidal transformation.
- Mechanism: The LoRA factors \(A\), \(B\) and an intermediate convolutional kernel \(W\) are first fused into a single equivalent kernel \(M_{AWB} = AWB^\top\). A sinusoidal transformation is then applied to the fused matrix: \(M_t = \sin(\omega_t \cdot M_{AWB})\), where \(\omega_t\) is the task-specific frequency. Sine-LoRA has demonstrated that sinusoidal mapping can substantially increase the effective rank of low-rank matrices. A Gaussian low-pass filter (\(K=7\), \(\sigma=1\)) is subsequently applied to smooth \(M_t\) and suppress high-frequency noise. The critical design choice is "fuse first, then apply sine"—since \(\sin(AWB) \neq \sin(A)\sin(W)\sin(B)\), the sinusoidal function does not satisfy multiplicative homomorphism and must be applied to the fused matrix to guarantee effective rank expansion.
- Design Motivation: Different frequencies \(\omega_t\) correspond to different sinusoidal waves, yielding different nonlinear mappings \(\mathcal{F}_{\omega_t}\). This naturally maps the same base matrix into different task-specific weight spaces, enabling genuine parameter reuse. The intermediate convolutional kernel \(W\) introduces spatial priors that are critical for dense prediction tasks.
Lightweight Clock Net (LCN):
- Function: Generates a bounded task-specific frequency \(\omega_t\) from the task token.
- Mechanism: A single-layer MLP maps the task token \(\boldsymbol{p}_t \in \mathbb{R}^C\) to a scalar frequency: \(\omega_t = s \cdot (\tanh(W_q \text{ReLU}(\boldsymbol{p}_t)) + c)\), where \(s\) and \(c\) are learnable scale and shift parameters. The \(\tanh\) activation produces bounded outputs to stabilize training. LCN parameters are shared across tasks.
- Design Motivation: The LCN is not the primary driver of performance gains; its core role is to produce bounded frequencies that stabilize the training of sinusoidal modulation. Learned differences among task tokens drive frequency differentiation.
Shared Decoder Group:
- Function: Replaces independent per-task decoders with frequency switching, reducing decoder parameter count.
- Mechanism: Conventional methods employ an independent decoder \(\phi_t\) for each task \(t\), with parameters growing linearly with the number of tasks. This work instead uses a shared \(M_{AWB}\) to produce task-specialized convolutions via frequency switching: \(\boldsymbol{h}_t = \widetilde{M}_t * \boldsymbol{x}_t + \boldsymbol{b}_t\), retaining only a task-specific bias \(\boldsymbol{b}_t\) and subsequent BN-ReLU-Conv layers.
- Design Motivation: The first convolutional layer in the original HRNet decoder alone exceeds one million parameters, and \(T\) tasks require \(T\)-fold overhead. After sharing, only one set of base matrices plus \(T\) scalar frequencies is needed.

Loss & Training¶

Standard multi-task training is adopted: \(\mathcal{L}_{MTL} = \sum_t w_t \mathcal{L}_t\). Task weights and loss functions follow prior work. Only the TA-Module (LoRA) and TS-Module (Sine-AWB + LCN) are trainable; the encoder backbone is frozen. Task tokens are introduced only at the first Transformer stage (VPT-shallow strategy).

Key Experimental Results¶

Main Results (PASCAL-Context, Swin-T ImageNet-1K)¶

Method	SemSeg↑	Human Parts↑	Saliency↑	Normals (rmse)↓	Δm(%)↑	Params (M)
Single Task	67.21	61.93	62.35	17.97	0	112.62
MTLoRA (r=64)	67.90	59.84	65.40	16.60	+2.55	8.34
TADFormer (r=64)	70.82	60.45	65.88	16.48	+4.24	7.38
Free Sinewich (r=64)	71.25	61.38	66.24	16.14	+5.39	6.53
Free Sinewich (r=32)	71.02	60.75	65.94	16.44	+4.51	4.04

Ablation Study¶

Configuration	SemSeg↑	Human Parts↑	Saliency↑	Normals↓	Δm(%)↑	Params (M)
Free Sinewich (full)	71.25	61.38	66.24	16.14	+5.39	6.53
w/o LCN	70.83	61.37	66.09	16.17	+5.12	6.51
w/o Low-pass filter	70.95	61.33	65.44	16.22	+4.82	6.53
w/o Sine	69.68	60.69	64.91	16.37	+3.67	6.53
Shared Base	71.25	61.38	66.24	16.14	+5.39	6.53
Independent Base	70.81	61.56	65.42	16.09	+5.03	10.22
Independent Decoder	70.91	61.57	66.03	16.10	+5.31	7.41

Key Findings¶

Sinusoidal transformation is the primary driver: Removing it causes Δm to drop from +5.39 to +3.67 (the largest single decline of −1.72), confirming that the frequency switching mechanism is the main source of performance gain.
Shared base outperforms independent base: Shared Base (+5.39, 6.53M) vs. Independent Base (+5.03, 10.22M)—fewer parameters yet better performance—confirming that genuine parameter reuse provides a regularization benefit.
Free Sinewich at r=32 (+4.51) already surpasses TADFormer at r=64 (+4.24), with only 4.04M vs. 7.38M parameters; frequency modulation compensates for the reduced rank.
LCN and the low-pass filter contribute modestly but provide training stability.
With the shared HRNet decoder, Free Sinewich requires only 1.07M decoder parameters (vs. 1.94M for TADFormer).
On NYUDv2, r=64 achieves Δm = −0.52, nearly matching full fine-tuning performance.

Highlights & Insights¶

Neuroscience-inspired frequency multiplexing: The principle of oscillatory multiplexing from neuroscience is transferred to parameter sharing design—the same parameters "oscillate" at different frequencies to serve different functions, yielding an elegant and intuitively clear concept.
Mathematical insight of "fuse first, then apply sine": The non-multiplicative-homomorphic nature of the sinusoidal function means that effective rank expansion can only be correctly achieved after fusing \(AWB\); this implementation detail is critically important.
Empirical validation of genuine parameter reuse: The Shared vs. Independent Base ablation clearly demonstrates that sharing outperforms independence, challenging the intuition that independent parameters are more flexible.

Limitations & Future Work¶

The current frequency \(\omega_t\) is a global scalar, uniform across all layers and spatial positions. The authors identify learning spatially/temporally varying frequencies as a future direction.
Performance on NYUDv2 remains slightly below the single-task baseline (Δm = −0.52), indicating room for improvement on scenarios involving heterogeneous tasks such as depth estimation and edge detection.
The nonlinearity introduced by the sinusoidal transformation may cause optimization difficulties for certain task combinations.
Validation is limited to Swin Transformer; the effectiveness on ViT and CNN backbones remains to be explored.

vs. MTLoRA: MTLoRA decomposes LoRA into task-agnostic and task-specific branches, with each task still maintaining independent parameters; Free Sinewich achieves genuine reuse of a single parameter set via frequency switching.
vs. TADFormer: TADFormer conditions convolutional layers with dynamic task filters, requiring more parameters (7.38M vs. 6.53M) and constituting "pseudo-sharing"; Free Sinewich achieves better performance with fewer parameters.
vs. Sine-LoRA: Sine-LoRA applies sinusoidal transformation to improve the effective rank of single-task LoRA; Free Sinewich parameterizes the frequency of the sinusoidal function, enabling a single base matrix to serve multiple tasks.
vs. DiTASK: DiTASK achieves task adaptation through differentiable transformations of SVD singular values; Free Sinewich directly applies sinusoidal modulation in the output space, yielding a simpler formulation.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The idea of achieving parameter reuse through frequency switching is original; the neuroscience analogy, while supplementary, enhances intuitive appeal.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers two benchmarks, multiple ablations (components, sharing strategy, decoder, rank), and comparisons against numerous baselines; however, Δm on NYUDv2 remains negative.
Writing Quality: ⭐⭐⭐⭐ The motivational chain is clear, mathematical derivations are complete, and ablation designs are well-targeted.
Value: ⭐⭐⭐⭐ A significant contribution to the PEFT-MTL field; the demonstration of genuine parameter reuse is instructive and generalizable to other multi-task settings.