FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models¶

Conference: NeurIPS 2025 arXiv: 2510.21363 Code: fuzihaofzh/FairImagen Area: Image Generation Keywords: Fairness, Bias Mitigation, Stable Diffusion, FairPCA, Text-to-Image

TL;DR¶

This paper proposes FairImagen, a post-processing debiasing framework that applies FairPCA projection in the CLIP prompt embedding space to remove demographic information, combined with empirical noise injection and joint cross-demographic debiasing, achieving significant fairness improvements in text-to-image generation without retraining the model.

Background & Motivation¶

Text-to-image models such as Stable Diffusion reproduce and even amplify social biases during generation—for example, "CEO" tends to generate white males while "nurse" tends to generate females. Existing debiasing methods fall into three categories: - Prompt methods: Require manual rewriting for each image, labor-intensive and non-generalizable - Fine-tuning methods: Require access to internal model parameters with high computational cost - Post-processing methods (SDID, TBIE, etc.): Lightweight but suffer from semantic drift, coarse direction estimation, and poor generalization to multiple attributes

The authors focus on post-processing to provide a simple, scalable, and model-agnostic fairness solution.

Core Problem¶

How can demographic information be precisely removed from the prompt embedding space while preserving semantic content?
How should overly "neutralized" outputs be handled after debiasing?
How can multiple protected attributes (gender + race) be jointly addressed without excessive pruning?

Method¶

Module 1: Prompt Embedding Extraction¶

For a prompt \(p\), the CLIP encoder extracts token-level embeddings \(E_p \in \mathbb{R}^{T \times D}\) and pooled embeddings \(\bar{E}_p \in \mathbb{R}^D\). An embedding matrix \(X\) and group indicator matrix \(Z\) are constructed by grouping prompts according to protected attributes.

Module 2: Fair Representation Transformation (FairPCA)¶

Classical PCA optimizes:

\[\arg\min_{P \in \mathbb{R}^{D \times d}: P^T P = I} \sum_{i=1}^{n} \|\mathbf{x}_i - PP^T \mathbf{x}_i\|_2^2\]

FairPCA incorporates a fairness regularization term:

\[\min_{P^T P = I} -\text{Tr}(P^T \Sigma_X P) + \lambda \|BP\|_F^2\]

where \(B = Z^T X \in \mathbb{R}^{G \times D}\) is the group feature matrix and \(\lambda\) controls the trade-off between reconstruction quality and fairness. Constraining \(P\) within \(\mathcal{N}(B)\) ensures that projected representations are orthogonal to any direction that discriminates between demographic groups.

At inference time, embeddings are projected as: \(\bar{E}_p' = PP^T \bar{E}_p\), \(E_p' = E_p PP^T\).

Module 3: Empirical Noise Injection¶

This module prevents outputs from becoming excessively neutral. The bias direction for each group \(g\) is computed as:

\[\nu_g = \frac{1}{|X^{(g)}|} \sum_{\bar{E}_p \in X^{(g)}} \bar{E}_p - \bar{E}\]

An empirical distribution \(\mathcal{D}_g = \{\nu_g^T \bar{E}_p : \bar{E}_p \in X^{(g)}\}\) is constructed, and a sample \(\delta \sim \mathcal{D}_g\) is used to apply perturbation:

\[\bar{E}_p'' = \bar{E}_p' + \epsilon \cdot \delta \cdot \nu_g\]

where \(\epsilon\) is a tunable noise scaling parameter.

Module 4: Joint Cross-Demographic Debiasing¶

Rather than projecting each attribute sequentially (which leads to excessive pruning), a Cartesian product of the joint attribute space is constructed. For example, gender \(\{M, F\}\) × race \(\{W, A, B\}\) yields 6 composite groups, and FairPCA is applied once over this joint space.

Key Experimental Results¶

Gender Debiasing¶

Method	Fairness↑	Accuracy↑	MUSIQ↑	Avg↑
Base (Stable Diffusion)	0.167	0.785	0.574	0.509
SDID	0.507	0.776	0.553	0.612
CDA	0.547	0.772	0.549	0.623
FairImagen	0.560	0.771	0.541	0.624
FairPrompt (upper bound)	0.732	0.766	0.586	0.695

Race Debiasing¶

Method	Fairness↑	Accuracy↑	MUSIQ↑	Avg↑
Base	0.193	0.785	0.574	0.517
SDID	0.370	0.770	0.537	0.559
TBIE	0.366	0.762	0.532	0.553
FairImagen	0.389	0.760	0.536	0.562
FairPrompt (upper bound)	0.444	0.752	0.566	0.587

Joint Gender + Race Debiasing¶

Method	Gender Fair↑	Race Fair↑	Accuracy↑	MUSIQ↑	Avg↑
Base	0.163	0.193	0.785	0.574	0.508
TBIE	0.400	0.286	0.776	0.546	0.574
FairImagen	0.537	0.320	0.753	0.544	0.611
FairPrompt (upper bound)	0.690	0.478	0.747	0.574	0.671

Key ablation findings: - Reducing hidden dimensionality improves fairness but decreases Accuracy/MUSIQ - Increasing the e-noise parameter improves fairness, particularly in joint debiasing settings

Highlights & Insights¶

⭐ FairPCA provides a well-defined mathematical formulation for the fairness–semantics trade-off, with \(\lambda\) serving as a precise control knob
⭐ The joint cross-demographic debiasing approach via Cartesian product construction avoids the excessive pruning caused by per-attribute sequential projection
⭐ No model retraining is required, making the method compatible with arbitrary off-the-shelf diffusion models
Empirical noise injection effectively prevents over-neutralization (e.g., generating feminized male figures)
Semantic consistency is preserved for historically gender-determined prompts (e.g., "medieval blacksmith") without blind "correction"

Limitations & Future Work¶

Accuracy and MUSIQ decline moderately (Accuracy drops from 0.785 to 0.771), indicating a fairness–fidelity trade-off
FairPCA assumes that bias is linearly separable; nonlinear biases may persist in high-dimensional spaces
Training the FairPCA projection matrix requires a set of attribute-annotated prompts, entailing a non-trivial though modest construction cost
Evaluation relies on the DeepFace classifier to detect demographic attributes, which may itself be biased
Validation is conducted only on Stable Diffusion 3; generalizability to other architectures (DALL-E, Imagen, etc.) remains to be confirmed

Property	Prompt Methods	Fine-tuning	SDID	TBIE	FairImagen
Training-free	✓	✗	✓	✓	✓
Black-box compatible	✓	✗	✓	✓	✓
Low human effort	✗	✓	✓	✓	✓
Multi-attribute debiasing	✗	✓	✗	✗	✓
Semantic fidelity	✓	✓	Weak	Weak	Moderate

The FairPCA approach is extensible to debiasing in other multimodal generative tasks such as video and 3D generation. The empirical noise injection strategy (applying controlled perturbations along bias directions) can be adapted for data augmentation. The joint attribute space construction via Cartesian products is generalizable to additional protected attributes such as age and disability status. The balance between fairness and "historical accuracy" warrants deeper ethical discussion.

Rating¶

Novelty: ⭐⭐⭐⭐ (Novel combination of FairPCA, empirical noise injection, and joint debiasing)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Multiple scenarios, baselines, ablations, and qualitative analyses)
Writing Quality: ⭐⭐⭐⭐ (Clear method description with coherent logical flow)
Value: ⭐⭐⭐⭐ (Strong practical utility as a plug-and-play fairness tool)