Skip to content

Incentivizing Time-Aware Fairness in Data Sharing

Conference: NeurIPS 2025 arXiv: 2510.09240 Code: None Area: AI Safety Keywords: Data sharing, fairness, time-aware incentives, Shapley value, cooperative game theory

TL;DR

This paper proposes a time-aware data sharing framework that introduces new incentive conditions (F6–F8) and two reward schemes—Time-Aware Reward Cumulation and Time-Aware Data Valuation—to ensure that participants who join a collaboration earlier receive higher-value rewards, while simultaneously preserving fairness and individual rationality.

Background & Motivation

Background: In collaborative machine learning (CML), aggregating data from multiple parties leads to improved model performance. Existing frameworks such as Shapley value-based methods have established incentive mechanisms including fairness and individual rationality to encourage participation.

Limitations of Prior Work: All existing frameworks assume that all participants join the collaboration simultaneously. In practice, however, participants may join at different times due to data cleaning delays, legal barriers, or information asymmetry. Existing frameworks thus provide no incentive for participants to contribute data early.

Key Challenge: Traditional fairness requires that "participants contributing data of equal value receive equal rewards," whereas a time-aware perspective demands that "earlier joiners receive higher rewards"—these two requirements are inherently in conflict.

Goal: To design reward mechanisms that simultaneously satisfy fairness and incentivize early participation.

Key Insight: Starting from cooperative game theory, the paper introduces a temporal dimension to redefine incentive conditions and constructs a compatible mathematical framework.

Core Idea: Early joiners bear greater risk and facilitate the participation of fence-sitters, and should therefore receive higher rewards. The conflict between fairness and time-aware incentives is resolved by conditioning fairness on the premise of simultaneous joining.

Method

Overall Architecture

The framework consists of three stages: (1) Data Valuation: a valuation function \(v\) measures the data value of each coalition; (2) Reward Determination: one of two time-aware schemes computes the reward value \(r_i\) for each participant; (3) Reward Realization: actual model rewards are generated via likelihood tempering or subset selection.

Key Designs

  1. Time-Aware Incentive Conditions (F1–F8): The classical five incentive conditions are carefully modified and extended:

    • F3* Simultaneous Symmetry: A "simultaneous joining" prerequisite is added, requiring \(r_i = r_j\) only when \(t_i = t_j\) and the marginal contributions are identical.
    • F6# Necessity: If the absence of a participant's data would reduce the value of any coalition to zero, that participant must receive a reward no less than others regardless of joining time. This protects holders of critical data from being penalized for late arrival.
    • F7# Time Monotonicity: All else being equal, a participant who joins earlier receives a reward no less than if they had joined later: \((t_i' < t_i) \implies r_i' \geq r_i\).
    • F8# Strict Time Monotonicity: When a participant's data has incremental value over their predecessors, earlier joining yields strictly higher rewards.
  2. Time-Aware Reward Cumulation: The collaboration period is divided into multiple intervals, each treated as an independent game. A participant's final reward is a weighted sum of Shapley values across intervals: \(r_i = \sum_{\tau=0}^{T} w^{(\tau)} \varphi_i^{(\tau)}\) with weights \(w^{(t)} = \beta^t / \sum_{\tau=0}^{T}\beta^\tau\), where \(\beta\) controls the emphasis on timing. As \(\beta \to \infty\), the scheme reduces to the standard Shapley value. This method satisfies F1–F8.

  3. Time-Aware Data Valuation: Each participant's collaborative capacity is defined as \(\lambda_i = e^{-\gamma t_i}\), and a modified valuation function is introduced: \(v_{C,\mathbf{t}} = \sum_{T \subseteq C, |T| \geq 2} d(v,T) \min_{i \in T}\{e^{-\gamma t_i}\} + \sum_{i \in C} d(v,\{i\})\) where \(d(v,T)\) denotes the Harsanyi dividend. The parameter \(\gamma \in (0,1]\) controls the temporal influence; setting \(\gamma = 0\) recovers the standard case. This method also satisfies F1–F8.

  4. Requirements on the Valuation Function (A1–A3):

    • Non-negativity A1: \(v_C \geq 0\)
    • Monotonicity A2: \(B \subseteq C \implies v_C \geq v_B\)
    • Superadditivity A3: \(v_{B \cup C} \geq v_B + v_C\) (for disjoint coalitions)

Conditional information gain (Conditional IG) and the dual of submodular functions both satisfy A1–A3.

Loss & Training

The reward realization stage employs two methods: - Likelihood Tempering: For a target participant, the posterior is updated using the participant's own likelihood and tempered likelihoods from others' data, exactly realizing reward values under Conditional IG. - Subset Selection: A model is trained on a subset of the aggregated data as the reward; applicable to arbitrary valuation functions but yields only an approximation.

Key Experimental Results

Main Results

Experiments are conducted on three datasets (Friedman synthetic data, California Housing, and MNIST) with \(n=3\) participants.

Core observations on the Friedman dataset (\(v_1 \approx v_2 > v_3\)):

\(t_1\) Method \(r_1^*\) \(r_2^*\) \(r_3^*\) Satisfies F2/F8
0 Shapley 35.88 34.64 28.40 F2✓/F8✗
0 Cumulation (\(\beta\)=1) 35.88 34.64 28.40 F2✓/F8✓
3 Cumulation (\(\beta\)=1) ~31 ~33 ~27 F2✓/F8✓
3 Valuation (\(\gamma\)=0.5) ~31 ~33 ~27 F2✓/F8✓

Ablation Study

Parameter sensitivity analysis (Friedman dataset):

Parameter Effect
Decreasing \(\beta\) Stronger emphasis on early participation; reward gap for early joiners increases
\(\beta \to \infty\) Reduces to standard Shapley value; time-independent
Increasing \(\gamma\) Greater advantage for early joiners; heavier penalty for late joiners
\(\gamma = 0\) Time-independent; equivalent to the standard case

Key Findings

  • When the gap in data values is small, the temporal factor dominates reward allocation. Even a participant with more valuable data may receive a lower reward than others if they join too late.
  • When the data value gap is large (\(v_3 \ll v_1\)), data quality consistently dominates; participant 1 always receives higher rewards than participant 3 regardless of joining time.
  • Both schemes guarantee individual rationality (each party's reward is no less than its standalone data value), and model performance (MNLP) is positively correlated with reward values.
  • A naïve approach of dividing Shapley values by joining time violates individual rationality F2 and necessity F6.

Highlights & Insights

  • Rigorous problem formulation: The eight incentive conditions are mutually compatible, and the resolution of conflicts via simultaneous-joining prerequisites and weak inequalities is highly elegant.
  • Complementary design of the two schemes: Time-Aware Reward Cumulation introduces time after reward allocation, while Time-Aware Data Valuation introduces time before data valuation.
  • The connection between the dual valuation function and machine unlearning represents a profound insight.
  • The framework achieves a good balance between data value and temporal value.

Limitations & Future Work

  • Computational complexity remains high: exact computation of Shapley values requires exponential evaluations; efficient estimation directions are suggested but not thoroughly validated.
  • The framework assumes data values are time-invariant, excluding data types such as time series and limiting its applicability.
  • Experiments are limited in scale (only 3 and 10 participants); real-world scenarios may involve far more parties.
  • Strategic misreporting of joining times by self-interested participants is not considered.
  • Béal et al. [2022] is the only closely related work studying early-joining incentives in cooperative game theory, though the setting and proposed schemes differ substantially.
  • The paper complements incentive design in federated learning—time-aware incentives in FL settings remain an open direction.
  • The proposed ideas are generalizable to crowdsourcing, data markets, and other scenarios requiring incentives for early participation.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ First systematic incorporation of the temporal dimension into data sharing fairness; incentive condition design is rigorous and innovative.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Three datasets validate the theoretical properties, but large-scale and real-world deployment experiments are lacking.
  • Writing Quality: ⭐⭐⭐⭐⭐ Mathematically rigorous with clear proposition definitions and coherent logical argumentation.
  • Value: ⭐⭐⭐⭐ Significant theoretical contributions to data sharing and collaborative ML, though practical deployment remains a challenge.