SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC¶
Conference: ICLR 2026 arXiv: 2506.15307 Code: None Area: AI Security Keywords: Privacy-preserving, Secure Multi-Party Computation, Prompt Tuning, Large Language Models, Random Feature Attention
TL;DR¶
This paper proposes SecP-Tuning, the first MPC-based privacy-preserving prompt tuning framework for LLMs. It eliminates backpropagation overhead via forward-only tuning and replaces softmax attention with a privacy-preserving random feature attention mechanism, achieving 12–16× speedup and 17–20× reduction in communication cost.
Background & Motivation¶
-
Background: Deployment of LLMs in privacy-sensitive domains such as healthcare and finance is constrained by data privacy requirements. MPC-based privacy-preserving machine learning can provide theoretical privacy guarantees for both model parameters and data, but has been largely limited to the inference phase.
-
Limitations of Prior Work: Directly applying MPC to fine-tune LLMs poses significant efficiency challenges. Performing SFT on RoBERTa-LARGE (24 layers, 1024 hidden dimensions) requires approximately 10 minutes per iteration and 970 GB of communication overhead. Backpropagation and optimizer operations account for 73% of total time, while softmax attention accounts for 75% of forward pass time.
-
Key Challenge: In MPC settings, numerous nonlinear operations (Softmax, GELU, LayerNorm) must be approximated via combinations of addition, multiplication, and comparison, resulting in extremely poor communication efficiency. Parameter-efficient fine-tuning methods such as LoRA reduce the number of updated parameters but cannot eliminate the MPC communication overhead introduced by backpropagation and softmax.
-
Goal: How to achieve efficient and high-performance privacy-preserving domain adaptation of LLMs within an MPC framework?
-
Key Insight: The paper combines forward-only tuning (FoT) to eliminate backpropagation, random feature attention (RFA) to replace softmax and reduce attention computation complexity, and a Server-Client architecture to offload MPC-unfriendly operations to the client.
-
Core Idea: By employing gradient-free forward-only tuning to avoid MPC overhead from backpropagation, and linearized attention to avoid nonlinear softmax operations, the framework achieves end-to-end efficient privacy-preserving fine-tuning.
Method¶
Overall Architecture¶
SecP-Tuning adopts a Server-Client architecture with a seven-step workflow: (1) the data owner initializes prompt embeddings and concatenates them with private data; (2) secret shares are generated and distributed to the servers; (3) two non-colluding servers execute privacy-preserving inference via MPC protocols; (4–5) inference result shares are returned to the data owner for reconstruction; (6) the data owner computes the loss locally in plaintext; (7) a gradient-free optimizer (CMA-ES) updates the prompts. This process iterates until convergence.
Key Designs¶
1. Privacy-preserving Forward-only Tuning
- Function: Completely eliminates MPC communication overhead from backpropagation and optimizer operations.
- Mechanism: A black-box gradient-free optimizer (CMA-ES) is used to update prompt embeddings via forward passes only. Exploiting the low intrinsic dimensionality of LLM prompts, optimization is performed in a low-dimensional latent space \(z \in \mathbb{R}^d\) (\(d \ll D\)) and mapped to the prompt space via a random projection \(A \in \mathbb{R}^{D \times d}\). Loss computation and optimizer updates are performed locally in plaintext by the data owner.
- Design Motivation: Backward computation of MPC-unfriendly nonlinear operations during backpropagation accounts for 73% of total time, representing the dominant bottleneck. FoT fundamentally eliminates this requirement.
2. Privacy-preserving Random Feature Attention (RFA)
- Function: Reduces attention complexity from \(O(n^2d)\) to \(O(ndr)\), avoiding the exponentiation and max operations in softmax.
- Mechanism: Random features are used to approximate the Gaussian kernel: \(\exp(\mathbf{x}^\top\mathbf{y}/\sigma^2) \approx \phi(\mathbf{x})^\top\phi(\mathbf{y})\), where \(\phi(\mathbf{x}) = \exp(\|\mathbf{x}\|^2/(2\sigma^2))[\varphi(\mathbf{x},\omega_1),...,\varphi(\mathbf{x},\omega_M)]^\top\). To handle the MPC-unfriendly cosine operations in RFA, an efficient privacy-preserving cosine protocol \(\Pi_{\text{cosine}}\) is designed, leveraging the sum-to-product trigonometric identity and requiring only one round of communication transmitting \(2\ell\) bits.
- Design Motivation: Exponentiation, division, and max operations in softmax are extremely costly in MPC and scale quadratically with sequence length.
3. Secure Cosine Protocol \(\Pi_{\text{cosine}}\)
- Function: Efficiently and securely computes the cosine function, enabling RFA within MPC.
- Mechanism: In the offline phase, shares of a random value \(t\) and its \(\sin(t)\), \(\cos(t)\) are pre-generated. In the online phase, \(\delta = (x+t) \mod \tau\) is first reconstructed, and then the trigonometric identity \(\cos(x) = \sin(\delta)\sin(t) + \cos(\delta)\cos(t)\) is applied.
- Design Motivation: Cosine is an unavoidable nonlinear operation in RFA, necessitating a dedicated efficient MPC protocol.
Loss & Training¶
Standard cross-entropy loss is used, computed locally in plaintext by the data owner. The optimizer is CMA-ES (gradient-free), operating in the low-dimensional latent space. An early stopping strategy is applied: training terminates if validation accuracy does not improve for 1,000 steps.
Key Experimental Results¶
Main Results¶
Efficiency comparison on RoBERTa-LARGE in a LAN setting (3 Gbps, 0.8 ms latency):
| Method | Total Time (s) | Communication (GB) | Speedup | Comm. Reduction |
|---|---|---|---|---|
| SFT | 651.60 | 970.72 | 1× | 1× |
| Prompt Tuning | 882.08 | 1116.21 | — | — |
| SecP-Tuning (FoT) | 174.14 | 205.36 | 3.7× | 4.7× |
| SecP-Tuning (FoT+RFA) | 55.17 | 56.55 | 12× | 17× |
Performance comparison (RoBERTa-LARGE, 16-shot):
| Method | SST-2 | Yelp P. | AG's News | MRPC | RTE | Avg. |
|---|---|---|---|---|---|---|
| SFT | 85.39 | 91.82 | 86.36 | 77.35 | 58.60 | 79.90 |
| Prompt Tuning | 68.23 | 61.02 | 84.81 | 51.61 | 54.69 | 64.07 |
| SecP-Tuning | 88.11 | 85.23 | 81.27 | 75.33 | 52.95 | 76.58 |
Ablation Study¶
| Configuration | Time (s) | Comm. (GB) | Notes |
|---|---|---|---|
| MPC softmax attention | Slowest | Highest | Baseline: \(O(n^2)\) complexity |
| RFA (w/o \(\Pi_{\text{cosine}}\)) | Limited improvement | Limited improvement | Slower than softmax on short sequences (cosine overhead) |
| RFA (w/ \(\Pi_{\text{cosine}}\)) | Fastest | Lowest | \(\Pi_{\text{cosine}}\) is the key enabler |
Key Findings¶
- FoT eliminates backpropagation and optimizer overhead, reducing runtime from 651 s to 174 s (3.7× speedup).
- RFA further accelerates the forward pass by 3.2× (174 s → 55 s), reducing communication from 205 GB to 57 GB.
- \(\Pi_{\text{cosine}}\) is critical to making RFA viable in MPC — without it, RFA is even slower than softmax on short sequences.
- SecP-Tuning supports an "API-as-a-Service" deployment model where the server cannot access updated parameters, eliminating the risk of memorization leakage.
- In few-shot settings, performance is competitive with or superior to gradient-based methods (SST-2: 88.11 vs. 68.23 for Prompt Tuning).
Highlights & Insights¶
- Systematic resolution of two major bottlenecks: FoT addresses backpropagation overhead while RFA addresses attention overhead; the two are complementary.
- Elegant Server-Client architecture: Offloading MPC-unfriendly operations to the client simultaneously improves efficiency and strengthens privacy.
- Theoretical and engineering value of \(\Pi_{\text{cosine}}\): An efficient protocol that achieves secure cosine computation in a single round of communication.
- Strong practical applicability: Supports black-box API deployment and is directly deployable.
Limitations & Future Work¶
- Validation is limited to RoBERTa-LARGE; the framework has not been extended to larger GPT/LLaMA-scale LLMs.
- RFA's approximation of softmax may degrade performance on certain tasks (e.g., lower scores on RTE).
- The semi-honest threat model assumption is relatively weak; supporting malicious adversaries would require additional zero-knowledge proof overhead.
- Future work could explore extending SecP-Tuning to additional fine-tuning paradigms such as LoRA.
Related Work & Insights¶
- HE-based approaches such as BlindTuner incur greater computational overhead and struggle to handle nonlinear operations.
- MPC frameworks such as CrypTen provide the infrastructure upon which this work builds.
- Insight: Privacy-preserving ML should not focus solely on inference; privacy during fine-tuning is equally important.
Rating¶
- Novelty: ⭐⭐⭐⭐ First MPC+LLM fine-tuning framework; the combination of FoT and RFA is original.
- Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive multi-dimensional evaluation covering efficiency, performance, deployability, and privacy.
- Writing Quality: ⭐⭐⭐⭐ Problem analysis is thorough and system design is clearly structured.
- Value: ⭐⭐⭐⭐ Significant reference value for engineering practice in privacy-preserving LLM fine-tuning.