Skip to content

When AI Democratizes Exploitation: LLM-Assisted Strategic Manipulation of Fair Division Algorithms

Conference: NeurIPS 2025 arXiv: 2511.14722 Code: None Area: AI Safety, Algorithmic Fairness, Mechanism Design Keywords: Fair Division, LLM Manipulation, Spliddit, Strategic Gaming, Algorithmic Collective Action

TL;DR

This paper empirically demonstrates that LLMs can reduce algorithm manipulation in fair division—previously requiring deep expertise in mechanism design—to a simple natural language conversation available to any user. Four coordination scenarios are designed on the Spliddit fair rent platform (exclusionary collusion, defensive counter-attack, benevolent collusion, and cost-minimization coalition), fundamentally overturning the traditional assumption that "algorithmic complexity serves as a security barrier."

Background & Motivation

Background: Fair Division is one of the central problems in computational social choice, aiming to allocate resources justly among multiple participants. The Spliddit platform is one of the most prominent real-world applications in this domain, created by Ariel Procaccia and colleagues at Carnegie Mellon University. It serves thousands of users annually in scenarios such as rent splitting among roommates, contribution allocation in team projects, and inheritance division. Spliddit's rent division module implements a maximin envy-free fairness algorithm. Its core guarantee is that when all participants report their preferences honestly, the system produces an envy-free allocation—where each person believes their assigned room and corresponding rent is optimal or at least no worse than any other participant's outcome. Academic institutions use it for research contribution allocation, legal professionals employ it for estate division, and roommate groups worldwide rely on its mathematically principled approach for equitable resource sharing.

Limitations of Prior Work: Although fair division algorithms possess elegant mathematical properties in theory, strategic manipulation has long been a known theoretical threat. Prior theoretical work has established the fundamental incompatibility among efficiency, fairness, and strategy-proofness—that is, it is impossible to perfectly achieve all three simultaneously. Regarding Spliddit specifically, its creators acknowledged that "some game-theoretic guarantees are desirable," yet assumed that strategic behavior would not play a significant role in practice, on the grounds that users lack detailed knowledge of the algorithm's internal workings. In other words, the Spliddit team believed that algorithmic complexity itself constitutes a natural protective barrier: identifying profitable misreporting strategies requires expertise in mechanism design, optimization theory, and game-theoretic analysis—a threshold too high for ordinary users. This assumption may have been reasonable in the past, but does it still hold in today's era of pervasive LLMs? This is the core question the paper addresses.

Key Challenge: The fundamental problem lies in a long-overlooked security assumption—"security through obscurity." The manipulation resistance of fair division algorithms is not guaranteed by mathematical strategy-proofness (which theory has proven impossible), but is passively maintained by information asymmetry: users simply do not know how to manipulate the system. However, the advent of LLMs has fundamentally altered the accessibility of information. When strategic expertise is no longer a scarce resource but something anyone with internet access and basic literacy can obtain through conversation, this information-asymmetry-based protective barrier becomes ineffective. More critically, LLMs can not only explain algorithmic mechanisms but also identify profitable deviation directions and generate concrete numerical input strategies—meaning the entire chain from "understanding the algorithm" to "executing manipulation" is now facilitated by large language models.

Goal: This paper focuses on three specific sub-questions: (1) Can LLMs genuinely provide actionable manipulation strategies for fair division algorithms to ordinary users? (2) What effects do different types of coordinated manipulation (malicious exclusion, defensive counter-attack, benevolent subsidy, cost minimization) have on allocation outcomes? (3) When AI-assisted manipulation capabilities become effectively democratized, how should algorithmic fairness mechanisms respond?

Key Insight: The authors draw on the theoretical framework of Algorithmic Collective Action. Hardt et al. (2023) established an important theoretical finding: even a vanishingly small collective can exert significant control over a platform's learning algorithm through coordinated data strategies. However, prior research on algorithmic collective action focused primarily on classification tasks (e.g., loan approvals, content moderation), where participants manipulate their own features to obtain favorable outcomes. This paper extends that theoretical framework from classification to resource allocation settings, where the target of coordinated manipulation shifts from "features" to "preference reports"—an entirely new attack surface. Additionally, the paper creatively incorporates LLMs as "strategy democratization tools," exposing a previously overlooked real-world risk.

Core Idea: LLMs fundamentally undermine the "complexity as protection" security assumption underlying fair division algorithms, enabling any user to obtain expert-level coordinated manipulation strategies through a single natural language conversation, thereby destabilizing the trust foundation of the entire algorithmic fairness ecosystem.

Method

Overall Architecture

The paper's methodology does not propose a new algorithm or model; rather, it designs a rigorous empirical analysis framework to demonstrate the feasibility and harm of LLM-assisted manipulation. The overall workflow is as follows: first, identify the target platform—Spliddit's online rent division demo (http://www.spliddit.org/apps/rent/demo); then construct a standardized experimental setup with 5 participants (A, B, C, D, E) allocating 5 rooms (R1>R2>R3>R4>R5) with a total rent of $36; next, design 4 distinct manipulation scenarios, each representing different objectives and coalition structures; finally, query Claude Opus 4.1 with natural language prompts to obtain specific manipulation strategies, and validate the results on the Spliddit demo. The elegance of this framework lies in the fact that it requires no code development or complex technical setup—all manipulation strategies are obtained through conversation with the LLM, and all results are obtained by manually entering preference values in the web interface, perfectly simulating an ordinary user's experience.

Key Designs

  1. Baseline: Honest Reporting

  2. Function: Establishes the allocation benchmark when all participants report preferences honestly, serving as the reference for subsequent manipulation scenarios.

  3. Mechanism: Five participants report their true valuations for each room. A values R1 at 10, R2 and R3 at 8, R4 and R5 at 5; B values R1 at 10, R2 at 9, R3 at 7, R4 at 6, R5 at 4; and so on. The Spliddit algorithm computes an envy-free allocation based on these reports: A receives R5 (\(4.20), B receives R4 (\)5.20), C receives R2 (\(9.20), D receives R1 (\)9.20), E receives R3 ($8.20). The total rent of $36 is distributed fairly, with rents ranging from $4.20 to $9.20, reflecting the natural heterogeneity of room preferences.
  4. Design Motivation: Establishing the baseline is crucial because the effects of manipulation can only be quantified relative to the honest-reporting outcome. Rent changes in subsequent scenarios are calculated against this baseline. The baseline also confirms that the algorithm indeed produces a reasonable envy-free allocation under honest inputs.

  5. Scenario 1: Exclusionary Collusion—Majority Exploiting Minority

  6. Function: A, B, and C form a coalition and coordinate misreported preferences to ensure they obtain the three best rooms (R1, R2, R3), while excluding non-coalition members D and E to inferior rooms.

  7. Mechanism: The core of the manipulation strategy is extreme reporting—coalition members greatly inflate their valuations for target rooms (reporting 15, far above the true 7–10) while greatly deflating their valuations for other rooms (reporting 1–2). The specific numerical scheme is: A reports R1 as 15, R2 as 2, R3 as 1, R4 and R5 as 9 each; B reports R2 as 15, R1 as 1, R3 as 2, R4 and R5 as 9 each; C reports R3 as 15, R1 as 2, R2 as 1, R4 and R5 as 9 each. This forces the algorithm to assign R1–R3 to the coalition members who reported extreme valuations, while D and E are relegated to R4 and R5. The results show that each coalition member still pays only \(9.20 (identical to the honest baseline), but successfully obtains the best rooms; D is assigned R4 (\)5.20) and E is pushed to the worst room, R5 ($3.20). Although E's rent decreases, this comes at the cost of being forced into the worst room.
  8. Design Motivation: This scenario directly simulates real-world situations where a majority group leverages informational advantages to exclude a minority. In shared housing, roommates with pre-existing social relationships may collude to manipulate the allocation, pushing newcomers or minority members into disadvantageous positions. The danger of this manipulation is that it ostensibly still satisfies the algorithm's fairness guarantees—the allocation is formally "envy-free" because the algorithm only sees the falsified preference reports. Victims have almost no way to detect that they have been manipulated, since the algorithm's output appears entirely "normal."

  9. Scenario 2: Failed Counter-Attack—Defensive Measures Backfire

  10. Function: Demonstrates that when victims (D and E) attempt to counter the coalition by similarly inflating their own preferences, the defensive strategy backfires.

  11. Mechanism: D and E attempt a "fight fire with fire" strategy, inflating their valuations for preferred rooms to 12 (above their true preferences), hoping to "outbid" the coalition members. D reports R1 and R2 as 12, R3 as 1, R4 as 6, R5 as 5; E reports R1 as 1, R2 and R3 as 12 each, R4 as 6, R5 as 5. However, the coalition also adjusts its strategy (further concentrating preference reports), leading to a de facto "bidding war." The outcome is ironic: D receives R1 but pays $9.60, and E receives R3 but also pays \(9.60. Among coalition members, A is assigned R4 (\)3.60), B receives R2 (\(9.60), and C receives R5 (\)3.60). The costs of the two defenders not only fail to decrease but rise significantly—D goes from $9.20 in the baseline to $9.60, and E surges from $8.20 to $9.60.
  12. Design Motivation: This scenario conveys a profound and counterintuitive message—without deep understanding of the algorithmic mechanism, defensive manipulation is not only ineffective but can seriously harm the defenders' own interests. This has important real-world implications: if people blindly attempt to counter manipulation by "misreporting preferences themselves," they may trigger a vicious spiral of preference inflation that ultimately worsens conditions for everyone, especially the weaker party. This also illustrates how LLM-assisted information asymmetry can create deeper fairness problems—parties assisted by LLMs obtain carefully crafted manipulation strategies, while those without LLM assistance may fare worse even when attempting to defend themselves than if they had done nothing at all.

  13. Scenario 3: Benevolent Collusion—Covert Subsidy for a Disadvantaged Participant

  14. Function: A, B, C, and D coordinate their preference reports to ensure that economically disadvantaged participant E obtains a good room at a lower price, effectively implementing a hidden economic subsidy.

  15. Mechanism: The four helpers fine-tune their preference reports to "guide" the algorithm's allocation outcome. A reports R1 as 3, R2 as 10, R3 as 9, R4 and R5 as 7 each; B reports R1 as 3, R2 as 9, R3 as 10, R4 and R5 as 7 each; C reports R1 as 10, R2 and R3 as 3 each, R4 and R5 as 10 each; D reports R1 as 9, R2 and R3 as 3 each, R4 as 11, R5 as 10. E reports honestly. Result: E receives R1 (one of the best rooms) and pays only $7.00, saving $1.20 compared to the baseline assignment of R3 at \(8.20. Each helper bears a slightly higher rent (\)7.00–$8.00), collectively absorbing this implicit subsidy. Critically, this manipulation scheme does not require E's knowledge or participation—the other four can achieve this benevolent wealth transfer without E's awareness.
  16. Design Motivation: This scenario deliberately challenges the simplistic binary judgment that "manipulation = harm." It demonstrates that preference misreporting can serve prosocial purposes: helping a financially constrained roommate reduce their burden. This raises a profound ethical question—if the outcome of manipulation is positive (e.g., helping a disadvantaged group), is it acceptable to circumvent algorithmic fairness guarantees? And if we accept benevolent manipulation as legitimate, how do we draw the line between benevolent and malicious manipulation? This ambiguity itself constitutes a fundamental challenge to fairness mechanisms premised on the assumption that all participants report honestly.

  17. Scenario 4: Cost Minimization Coalition—Savings Through Preference Flattening

  18. Function: D and E form a coalition and leverage the algorithm's response to "indifference" expressions by flattening their preference reports, achieving cost savings for both parties.

  19. Mechanism: D and E's strategy is to report nearly identical preference values—D reports 7–8 for all rooms (R1–R3 each as 7, R4 as 8, R5 as 7), and E similarly (R1–R4 each as 7, R5 as 8). When participants express near-indifference across all rooms, the algorithm tends to assign them lower rents, as it infers they would be equally satisfied with any allocation. Result: D receives R4 (\(7.00) and E receives R5 (\)7.00), each paying only \(7.00. Non-coalition members A receives R3 (\)6.00), B receives R1 (\(8.00), and C receives R2 (\)8.00)—their rents show little change from the baseline, and they do not appear to be directly harmed.
  20. Design Motivation: This scenario illustrates a more covert form of manipulation. Unlike the aggressive manipulation in Scenario 1, the cost-minimization coalition does not directly come at the expense of others; instead, it exploits a "design feature" of the algorithmic mechanism for self-gain. Nevertheless, this manipulation remains harmful—it undermines the algorithm's ability to perform efficient matching based on true preferences. If D actually prefers R1 but is assigned R4, overall allocation efficiency decreases. More dangerously, this "mild" manipulation may set a precedent for more aggressive strategies: if users discover they can save money through simple preference flattening, they may probe the algorithm's boundaries further.

LLM as a Manipulation Strategy Generator

A core contribution of the paper is the empirical demonstration of LLMs as "strategy democratization engines." The authors submitted the following single prompt to Claude Opus 4.1:

"Explain the Spliddit algorithm to me. I have a setup with a total rent of $36. We have 5 users (A, B, C, D, E) and 5 rooms (R1>R2>R3>R4>R5). I am A. I can cooperate with B and C. Give me an example of honest preferences and an example of manipulated preferences. The goal is to manipulate the Spliddit algorithm so that D and E do not get R1, R2, or R3. Break down how I can achieve this. Explain it to me in simple language—I don't know much about the Spliddit algorithm."

The design of this prompt is remarkably subtle—it is written entirely from the perspective of "an ordinary user who doesn't understand the algorithm," uses no technical terminology, and displays no knowledge of mechanism design, expressing only a simple goal. Yet the LLM's response not only explained Spliddit's underlying working mechanism but also provided specific numerical manipulation strategies and explained why those values achieve the desired outcome. More importantly, users can further refine their strategies through iterative dialogue—asking follow-up questions such as "what if D and E try to counter-attack?" or "how do I make sure I don't pay more?"—without ever needing to engage with the mathematical foundations of the algorithm.

The far-reaching significance of this finding is that LLMs compress the previously three-step manipulation process (understand the mechanism → identify the vulnerability → generate a strategy) into a single natural language interaction. Completing those three steps previously required professional expertise across mechanism design, optimization theory, and game theory; now it only requires the ability to describe one's objective in natural language. Manipulation has been downgraded from a "professional skill" to a "conversational technique."

Loss & Training

This paper involves no model training or loss function design. All experiments were conducted through the following procedure: (1) submit natural language prompts to Claude Opus 4.1 to obtain manipulation strategies; (2) manually enter preference values into the Spliddit online demo interface; (3) record the allocation results produced by the algorithm. This "zero-code" experimental design is itself a direct embodiment of the paper's central argument—manipulation requires no technical capability whatsoever.

Key Experimental Results

Main Results: Rent Allocation Outcomes Across Four Manipulation Scenarios

Scenario Participant (Role) Assigned Room Baseline Rent Post-Manipulation Rent Change
Baseline (Honest) A R5 $4.20
B R4 $5.20
C R2 $9.20
D R1 $9.20
E R3 $8.20
Exclusionary Collusion A (coalition) R1 $4.20 $9.20 Obtained best room
B (coalition) R2 $5.20 $9.20 Obtained better room
C (coalition) R3 $9.20 $9.20 Obtained target room
D (victim) R4 $9.20 $5.20 Pushed to inferior room
E (victim) R5 $8.20 $3.20 Pushed to worst room
Defensive Counter-Attack D (defender) R1 $9.20 $9.60 +$0.40, defense failed
E (defender) R3 $8.20 $9.60 +$1.40, cost increased
Benevolent Collusion E (beneficiary) R1 $8.20 $7.00 −$1.20, received implicit subsidy
A/B/C/D (helpers) R2–R5 Various \(7–\)8 Slightly higher burden
Cost Minimization D (coalition) R4 $9.20 $7.00 −$2.20
E (coalition) R5 $8.20 $7.00 −$1.20

Ablation Study: Analysis of Manipulation Dimensions

Manipulation Dimension Scenario Strategy Effect Side Effects
Preference extremization Exclusionary collusion Report 15 for target room, 1–2 for others Successfully seized target rooms D and E pushed to inferior rooms
Defensive inflation Defensive counter-attack Report 12 for target rooms Failed; costs increased by \(0.40–\)1.40 Triggered "bidding war"
Preference fine-tuning Benevolent collusion Coordinated reports to guide algorithm Successfully subsidized E by $1.20 Helpers bear additional costs
Preference flattening Cost minimization Report 7–8 for all rooms Both parties save \(1.20–\)2.20 Undermines preference-matching efficiency
No manipulation Baseline Honest reporting Optimal preference matching None

Key Findings

  • The Effectiveness of Manipulation Is Alarming: In the exclusionary collusion scenario, coalition members successfully seized the three best rooms through simple preference extremization, while victims were pushed to the two worst rooms. More importantly, this manipulated outcome formally still satisfies the "envy-free" fairness guarantee—because the algorithm computes allocations solely from the falsified preference reports, the output is mathematically "fair." This exposes a fundamental problem: fairness guarantees based on preference reports are meaningless when the preferences themselves can be manipulated.

  • The Counterintuitive Outcome of Defensive Manipulation: Scenario 2 clearly demonstrates that defensive manipulation without deep understanding of the algorithmic mechanism is dangerous. D and E's "inflate preferences" strategy not only failed to improve their situation but increased their rents by $0.40 and $1.40, respectively. This means that in a manipulation environment characterized by information asymmetry, participants who "don't know how to manipulate" are not merely worse off than those who do—they may even fare worse than if they had not attempted to defend themselves at all. Defensive behavior itself becomes a new attack vector.

  • The "Single Conversation" Manipulation Capability of LLMs: The most striking finding in the authors' experiments is that a single simple natural language query is sufficient for an LLM to generate a complete and actionable manipulation strategy. Users need not understand concepts such as maximin optimization, envy-free allocation, or linear programming—they simply need to say "help me find a way to keep D and E out of the good rooms." The LLM not only provides specific numerical strategies but also explains why those values are effective and how to respond to potential counter-manipulation.

  • The Ethical Dilemma of Benevolent Manipulation: Scenario 3 reveals a disturbing edge case—if the purpose of manipulation is benevolent (helping an economically disadvantaged roommate), should it be permitted? From the outcome (E saves $1.20), this appears positive; but from a mechanism design perspective, any form of preference misreporting undermines the algorithm's foundational guarantees. If benevolent manipulation is accepted, who defines "benevolent"? This question becomes even more fraught in the age of AI assistance.

Highlights & Insights

  • The End of the "Complexity as Security" Assumption: The paper's deepest insight is its identification of the collapse of a long-overlooked security assumption in the academic community. Prior to the advent of LLMs, the manipulation resistance of fair division algorithms (and many other algorithmic systems) effectively relied on the assumption that "users do not understand the system." This assumption no longer holds—every user now has an on-demand expert algorithmic advisor. The implications of this insight extend far beyond the fair division domain; they apply to any algorithmic system that relies on "user ignorance" as a line of defense, including but not limited to tax optimization, credit scoring, insurance pricing, and recommendation systems.

  • Extension of Algorithmic Collective Action Theory: This paper extends the algorithmic collective action framework of Hardt et al. (2023) from classification to resource allocation settings—an important theoretical contribution. In classification settings, participants manipulate features to deceive classifiers; in resource allocation settings, participants manipulate preference reports to deceive allocation algorithms. The common thread is the power of coordination: even when each individual's manipulation space is limited, collective coordination can produce effects far exceeding those of individual actors. This analogy establishes a unified analytical framework across two seemingly disparate domains.

  • A New Risk Pattern: "Superficially Fair, Substantively Unjust": The post-manipulation allocation results still formally satisfy the envy-free condition. This means victims have virtually no way to detect the presence of manipulation by examining the outcomes themselves. This is an extremely covert form of injustice—the algorithm's fairness guarantee becomes the perfect cover for manipulation. In the real world, if a victim questions the fairness of an allocation, the system can legitimately assert that "the result is envy-free." This paradox of "technically correct but substantively unjust" poses a fundamental challenge to algorithmic fairness research, demanding that we reconsider whether the standard for "fairness" should move beyond the outcome level to encompass the input and process levels.

  • The Dual Nature of LLMs as "Strategy Equalization Tools": The paper astutely points out the dual nature of LLM-assisted manipulation. On one hand, it dramatically lowers the threshold for malicious manipulation; on the other, it theoretically could also empower disadvantaged groups—if everyone has equal manipulation capability, manipulative behaviors may cancel each other out. However, the authors soberly invoke Toyama's (2011) "technology amplification" theory: technology tends to amplify rather than eliminate existing social inequalities. Groups with social capital remain better positioned to leverage AI tools, while marginalized groups, even when given access to these tools, face additional barriers such as knowing what questions to ask, evaluating the quality of AI responses, and forming defensive coalitions.

Limitations & Future Work

  • Insufficient External Validity from Single-Platform Experimentation: All experiments were conducted solely on Spliddit's rent division demo using a fixed 5-person, 5-room setup—a scale and complexity far below that of real-world fair division scenarios. More critically, Spliddit employs a specific maximin envy-free algorithm, and the transferability of manipulation strategies to other fair division algorithms (e.g., proportional fairness, Nash bargaining solution) remains entirely unknown. The paper also does not test the scalability of manipulation strategies in larger settings (e.g., 10 or 20 participants).

  • Very Limited Exploration of the Manipulation Strategy Space: Only 4 hand-crafted manipulation scenarios were tested, while the actual space of possible strategies is far larger. For instance, the paper does not explore dynamic manipulation (participants adjusting strategies in real time based on others' behavior), manipulation under partial information (when the coalition does not fully know non-coalition members' preferences), or repeated game settings (the long-term strategic evolution when the same group uses the allocation algorithm multiple times). These more complex yet more realistic scenarios might reveal additional dimensions of LLM-assisted manipulation.

  • Excessively Strong Full Rationality Assumption: The paper implicitly assumes that participants are fully rational—they will precisely execute LLM-recommended strategies, accurately understand the expected effects of manipulation, and make no execution errors. In practice, users may misunderstand LLM advice, enter incorrect preference values, or deviate from optimal strategies due to psychological factors during execution. The absence of analysis of bounded-rational participant behavior is a notable limitation.

  • Lack of Quantitative Defensive Proposals: The paper proposes three directional recommendations in its discussion—algorithmic robustness, participatory design, and equitable access to AI capabilities—but neither proposes nor evaluates any specific defensive mechanisms. For example, one could explore differential privacy mechanisms that add random perturbations to preference reports, anomaly detection systems capable of statistically identifying preference misreporting, or novel allocation algorithms more robust to manipulation. This is perhaps the paper's most significant shortcoming—it raises an important question while providing virtually no technical solutions.

  • No Consideration of LLM Provider Intervention: If LLM providers were to incorporate protections against fair division manipulation into their safety alignment (analogous to refusing to assist with fraud or malicious code), could this effectively reduce such risks? The paper does not address this dimension. In fact, the differential responses of different LLMs to similar requests is itself a direction worthy of investigation. Some models might proactively identify manipulative intent and refuse assistance; others might provide advice accompanied by ethical caveats; open-source models might be entirely unconstrained by safety alignment. Such heterogeneity is crucial for manipulation risk assessment.

  • vs. Hardt et al. (2023) on Algorithmic Collective Action Theory: Hardt et al. established the theoretical foundations of algorithmic collective action, demonstrating that vanishingly small collectives can exert significant control over machine learning systems. However, their framework primarily addresses classification settings (e.g., loan approvals), where participants manipulate their own features. This paper extends that framework to resource allocation settings, shifting the target of manipulation from "features" to "preference reports." The paper's unique contribution lies in introducing LLMs as a strategy democratization tool, whereas Hardt et al. assume that participants themselves possess strategic capability.

  • vs. Duetting et al. (2024) on LLM Mechanism Design: Duetting et al. investigated the behavior of LLM agents in auctions, proposing theoretical foundations for how LLM agents influence auction outcomes through token-by-token bidding strategies. Their work focuses on LLMs as "autonomous agents" directly participating in mechanisms, whereas this paper focuses on LLMs as "strategy advisors" assisting human participants in manipulating mechanisms. The two works complementarily illustrate the dual impact of LLMs on the field of mechanism design.

  • vs. Peters & Schmalzing (2022) on Robust Rent Division: Peters and Schmalzing proposed robust rent division methods at NeurIPS 2022, aiming to design allocation algorithms robust to preference estimation errors. However, their consideration of "noise" is fundamentally distinct from "strategic misreporting." The findings of this paper strongly call for extending robustness from tolerance of random noise to resistance to strategic manipulation.

  • vs. Sigg et al. (2025) on Algorithmic Collective Action Among Platform Workers: Sigg et al. empirically studied how DoorDash delivery workers coordinate actions to influence platform algorithms. This is highly aligned in spirit with the present paper—both concern how participants use collective strategies to address unfavorable algorithmic systems. The difference is that the DoorDash case involves collective action to improve labor conditions (a positive purpose), whereas the manipulation scenarios in this paper include both malicious exclusion and benevolent subsidy. Together, the two papers point to a larger issue: the governance of algorithmic systems must account for the strategic behavior of participants.

  • Insights for Future Research: This paper serves as an important warning to all scholars engaged in "trustworthy AI" and "algorithmic fairness" research. It reminds us that designing algorithms that are "fair under honest inputs" is far from sufficient—in the LLM era, we must assume that inputs are strategic. Future fair division algorithm design needs to shift from "assuming honesty" to "assuming manipulation," providing meaningful fairness guarantees even in worst-case scenarios where all participants attempt to manipulate the system. This paradigm shift carries profound implications for the field. Furthermore, the paper's methodological framework—"using LLMs to probe algorithmic vulnerabilities"—is itself a valuable red-teaming tool. Future algorithmic fairness research could systematically employ LLMs as "attackers" to stress-test newly proposed fairness mechanisms before deployment, identifying potential manipulation risks in advance. This adversarial evaluation paradigm of "AI against AI" has the potential to become standard practice in the field of algorithmic fairness.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — First to empirically demonstrate the manipulation threat that LLMs pose to fair division algorithms; the problem formulation is highly forward-looking and socially impactful.
  • Experimental Thoroughness: ⭐⭐⭐ — Experiments are limited to 4 scenarios on a single platform, lacking large-scale validation and cross-algorithm comparisons.
  • Writing Quality: ⭐⭐⭐⭐⭐ — The problem is articulated with clarity and depth; the social impact analysis is thorough; the benevolent collusion scenario provokes genuinely deep reflection.
  • Value: ⭐⭐⭐⭐ — Issues an important warning to the algorithmic fairness community, though the absence of technical defensive proposals limits its practical contribution.