This paper proposes leveraging the contextual understanding capabilities of LLMs to detect and quantify gender representation bias in the training corpora of grammatically gendered languages (e.g., Spanish and Valencian). A severe male-dominated imbalance is identified, and continuous pre-training using reverse-biased data is verified to effectively mitigate bias in model outputs.
Problem Definition: Gender representation bias refers to the unequal frequency of reference to individuals of different genders in text. This upstream bias in training data is the root cause of bias propagation and amplification in downstream models.
Limitations of Prior Work:
Existing studies primarily focus on stereotyping bias (associating specific roles with genders) rather than representation bias (frequency inequality).
Current methods (such as gender polarity) are designed specifically for English, relying on string matching of predefined gendered word lists. These approaches cannot handle grammatically gendered languages, where all nouns have grammatical gender (e.g., Spanish "el coche" is the masculine noun "car," but is a non-human reference).
Importance: Approximately 38% of the global population speaks grammatically gendered languages. In these languages, the masculine plural form typically defaults to representing mixed-gender groups (e.g., Spanish "los profesores" refers to both male teachers and teachers in general), which itself constitutes an implicit gender representation bias.
Core Motivation: There is a need for a method capable of distinguishing "nouns/pronouns referring to humans" from "non-human nouns" and correctly classifying their grammatical gender—which extends beyond simple vocabulary list matching and requires semantic understanding.
LLM-based Method: Through meticulously designed prompts and few-shot examples, the LLM is enabled to complete word identification, human-reference determination, and grammatical gender classification in a single query. The text is processed sentence-by-sentence to fully utilize the LLM's contextual semantic understanding.
Exclusion of Adjectives: Since the gender marking of adjectives typically depends on their associated nouns and does not independently convey human-reference information, excluding them reduces complexity.
Validation of Bias Propagation via Continuous Pre-training: Three synthetic datasets of 5,000 sentences each (male-biased, female-biased, and balanced) are constructed to perform QLoRA continuous pre-training (<20 steps) on three open-source LLMs, validating how training data bias propagates to model outputs.
Continuous pre-training utilizes the standard language modeling loss (next token prediction) combined with QLoRA for parameter-efficient training. The core objective is not to train a new model, but to validate the bias propagation hypothesis.
Spanish Bias is Significantly Larger than English: In the same parallel corpus, the masculine representation bias in Spanish is 3 to 4 times higher than in English, which relates to grammatical gender conventions.
Valencian Bias is Lower: 2:1 to 3:1, potentially due to more formal conventions of inclusive language in official administrative documents.
Bias is Propagated: Training on male-biased data significantly increases the proportion of masculine references in model outputs.
Reverse Bias Mitigates Effectively: Fine-tuning with as few as 5,000 female-biased sentences via continuous pre-training reduces the model output bias from 3:1 to nearly 1:1.
Method Validation: GPT-4-Turbo achieves F-scores of 90.24% and 84.43% on the Spanish and Valencian validation sets, respectively, proving the stability and reliability of the proposed method.
Fills the gap in quantifying gender representation bias in training corpora for grammatically gendered languages, utilizing a simple yet effective methodology.
Distinguishing between "nouns referring to humans" and "all nouns" is a critical design choice—preventing non-human references like "table (feminine)" and "car (masculine)" from being erroneously counted as bias.
The discovery of the reverse-bias mitigation strategy is highly practical: it does not require vast amounts of balanced data, as a mere 5,000 sentences of counter-biased data can significantly correct the bias.
The method is generalizable to other grammatically gendered languages (such as French, German, and Czech), providing a valuable utility for fairness research in multilingual NLP.
Gender polarity (Dhamala et al., 2021): An English gender polarity method based on predefined vocabulary lists, inapplicable to grammatically gendered languages.
BOLD (Dhamala et al., 2021): A benchmark for social bias, with a primary focus on stereotyping bias.
Biesialska et al. (2024): Reports the correlation between stereotyping bias and representation bias.
Implications for multilingual model fairness: highlights the need to design distinct bias-detection methodologies based on different linguistic typologies.