Skip to content

COOPERA: Continual Open-Ended Human-Robot Assistance

Conference: NeurIPS 2025 arXiv: 2510.23495 Code: None Area: Robotics Keywords: human-robot collaboration, continual learning, open-ended tasks, LLM-driven human simulation, personalized robot assistant

TL;DR

This paper proposes the COOPERA framework, the first to enable continual, open-ended human-robot collaboration research. LLM-driven simulated humans with psychological traits and long-term intentions interact with robots over multiple days in a 3D environment. The robot progressively improves its personalized assistance by learning human characteristics and contextual intentions.

Background & Motivation

Background: Research on robot-assisted tasks has primarily focused on short episodic settings, where robots are evaluated on predefined short-horizon tasks.

Limitations of Prior Work: In the real world, humans have preferences and long-term goals, requiring different types of assistance at different times. Existing methods rely on predefined, closed-form task representations and are unable to handle open-ended, personalized human-robot interaction.

Key Challenge: Achieving truly personalized assistance requires the robot not only to understand the current environment, but also to reason across extended time horizons about human behavioral patterns, preferences, and habits—a capability entirely absent from existing frameworks.

Goal: To establish a framework supporting continual, open-ended human-robot collaboration, in which the robot learns and adapts to individual human characteristics over time.

Key Insight: Simulate humans with LLM-driven, psychologically grounded long-term behavior, and design a feedback mechanism enabling the robot to gradually personalize its assistance.

Core Idea: Extend human-robot collaboration from "single episodic tasks" to "multi-day continual open-ended interaction," allowing the robot to progressively understand individual traits, habits, and time-dependent behavioral patterns through end-of-day feedback.

Method

Overall Architecture

COOPERA consists of three components: (1) LLM-driven simulated humans with personality traits and full-day behavioral schedules; (2) a robot assistance agent that infers human intentions and provides help via a VLM and classifiers; and (3) an end-of-day feedback mechanism in which the human evaluates robot performance and updates the robot's cognitive profile of the human.

Key Designs

  1. Trait-Driven Human Simulation:

    • Function: Generate simulated humans with psychological traits and long-term behavioral consistency.
    • Mechanism: Big Five personality traits are inferred from synthetic dialogue data. Based on traits, environmental context, and behavioral history, an LLM generates intentions and tasks for each time slot. A Reflexion mechanism performs two rounds of self-correction, and memory retrieval is used to compress long context inputs.
    • Design Motivation: Human behavior must exhibit intra-day temporal dependency (morning cleaning → afternoon exercise) as well as inter-day diversity (9 AM activities differ between Monday and Tuesday).
  2. Decoupled Intent–Task Inference:

    • Function: The robot infers the human's current intention and determines the appropriate form of assistance.
    • Mechanism: Task inference is decoupled into two stages—the VLM first imagines multiple possible intentions, which are then filtered by an intent classifier; for each positively classified intention, specific tasks are inferred and further filtered by a task classifier. Both classifiers are fine-tuned at the end of each day using feedback data.
    • Design Motivation: Due to inter-day behavioral diversity, the combination of VLM-based imagination of multiple possibilities with classifier-based filtering enables accurate capture of the correct set of meta-intentions and meta-tasks.
  3. Continual Human Profile Update:

    • Function: Progressively understand individual human personality and habits through multi-day interaction.
    • Mechanism: At the end of each day, the human and robot engage in a discussion; a VLM infers and summarizes the human's traits, habits, and psychological data from the collaboration history. This profile is incorporated into subsequent VLM prompts and classifier inputs.
    • Design Motivation: Different humans require different collaboration strategies; the continuously accumulated profile enables the robot to evolve progressively from a "generic assistant" into a "personalized assistant."

Loss & Training

The VLM is optimized via prompt engineering without training. The classifiers are based on Mistral-7B and fine-tuned with LoRA, producing binary yes/no predictions.

Key Experimental Results

Main Results

Method Intra-Day Improvement Inter-Day Improvement Notes
COOPERA (Ours) Highest Highest (second only to Oracle) Best overall
Oracle (given intent) Low First Knows intent but cannot learn
Direct Prompting Minimal Marginal Pure prompting fails to adapt
Direct Finetuning Marginal Marginal Learns 1-to-1 mapping; cannot handle diversity

Ablation Study

Experiment Generalization Accuracy Notes
New scene generalization 0.465 vs. 0.269 baseline Cross-scene generalization is relatively easier
New human generalization 0.343 vs. 0.258 baseline Cross-human generalization is considerably harder

Key Findings

  • Intra-day improvement is primarily driven by temporal dependency learning: the robot learns that "this person is typically exercising at 2 PM."
  • Inter-day improvement stems from the accumulation of personality profiles: multi-day feedback enables more accurate understanding of individual preferences.
  • Generalizing to new humans is substantially more difficult than generalizing to new scenes, as the diversity of human behavior far exceeds that of environmental variation.

Highlights & Insights

  • This work is the first to extend human-robot collaboration to a multi-day continual interaction setting, filling an important gap in the field.
  • The design of psychologically trait-driven simulated humans is particularly elegant; a user study demonstrates that real humans can identify the distinct personalities of different simulated humans with an accuracy of 71.2%.
  • The decoupled VLM-plus-classifier design is transferable to other scenarios requiring inference under uncertain intentions.

Limitations & Future Work

  • The framework is evaluated in the Habitat 3.0 simulated environment, which introduces a gap relative to the real world.
  • Simulated human behavior is LLM-generated and may deviate from genuine human behavior.
  • Only pick-and-place and simple interaction tasks are considered; more complex collaborative scenarios are not addressed.
  • vs. Watch-and-Help: WAH addresses single-episode closed tasks, whereas COOPERA targets multi-day open-ended continual collaboration.
  • vs. Generative Agents: Generative agents focus on language-level social simulation, whereas COOPERA integrates psychologically trait-driven behavior with 3D environment interaction.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ The continual open-ended human-robot collaboration framework constitutes an entirely new setting.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive across multiple settings, user studies, and generalization experiments.
  • Writing Quality: ⭐⭐⭐⭐ The framework is complex but clearly articulated.
  • Value: ⭐⭐⭐⭐⭐ Opens a new research direction for personalized robot assistants.