Skip to content

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

Conference: ICML 2025
arXiv: 2506.06221
Code: https://sites.google.com/view/biassembly/
Area: Robotics
Keywords: Bimanual collaboration, geometric assembly, point-level affordance, fragment reassembly, long-horizon planning

TL;DR

The BiAssemble framework is proposed to decompose the geometric assembly task into three steps (pick-up -> alignment -> assembly) by learning collaboration-aware point-level affordances. It outperforms existing affordance and imitation learning methods in fractured object reassembly tasks and is validated on a real-world benchmark.

Background & Motivation

Background: Shape assembly is categorized into furniture assembly (functional component association) and geometric assembly (fractured fragment reassembly). The latter is widely applicable (e.g., cultural relic restoration, bone reassembly) but remains under-researched.

Limitations of Prior Work: (a) Existing methods only predict the target pose, ignoring collisions during actual manipulation; (b) fragments have arbitrary geometries with no semantic definitions, making grasping and manipulation extremely difficult; (c) the bimanual coordination and contact-rich assembly process within long-horizon action sequences are highly complex.

Key Challenge: Both the observation space (arbitrary geometric fragments) and the action space (long-horizon bimanual coordination) are extremely large.

Goal: How to enable bimanual robots to collaboratively assemble fractured fragments of arbitrary shapes?

Key Insight: Leverage point-level affordances to achieve geometric generalization, and decompose the long-horizon task into three sub-steps to reduce complexity.

Core Idea: Mimic human intuition—pick-up -> alignment (leaving a gap) -> gradual pushing. Each step utilizes affordance to perceive the constraints of subsequent steps.

Method

Overall Architecture

A three-step workflow: 1. Pick-up: Choose grasp points by learning point-level affordances (considering both grasp feasibility and subsequent assembly compatibility). 2. Alignment: Move the fragments to an aligned pose (finding a collision-free aligned pose through backward disassembly). 3. Assembly: Predict a collision-free direction to gradually push the fragments together.

Key Designs

  1. Collaboration-Aware Point-Level Affordance:

    • Function: Predicts a grasp score for each point on the fragment surface, considering both local geometry and subsequent operations.
    • Mechanism: \(\text{Affordance} = \text{grasp feasibility} \times \text{alignment reachability} \times \text{assembly direction compatibility}\).
    • Design Motivation: It is insufficient to select points that are merely graspable geometrically — it must also be ensured that subsequent alignment and assembly can be successfully completed after the grasp.
  2. Collision-Free Aligned Pose Generation:

    • Function: Find the aligned pose by backward disassembly from the fully assembled state.
    • Mechanism: Separate the fragments in the opposite direction of assembly, leaving a safety gap.
    • Design Motivation: Plunging directly into the target pose inevitably causes collisions; aligning first and then pushing helps avoid collisions.
  3. Real-World Reproducible Benchmark:

    • Function: Create a globally accessible standardized fragment benchmark.
    • Mechanism: Utilize standard objects (e.g., a specific brand of mug) + a standardized breaking method, and provide 3D meshes.
    • Design Motivation: Fractured fragments with varying geometries make fair evaluation difficult; a standardized benchmark addresses this issue.

Loss & Training

  • The affordance model is trained in simulation and transferred to the real world.
  • The simulation environment supports bimanual fragment assembly.

Key Experimental Results

Main Results

Method Assembly Success Rate Grasp Success Rate
Imitation Learning ~25% ~60%
Single-step Affordance ~40% ~75%
BiAssemble ~65% ~85%

Ablation Study

Configuration Success Rate Description
No Collaboration Awareness ~40% Grasp points ignore subsequent feasibility
No Alignment Step (Direct Assembly) ~20% Massive collisions
Full BiAssemble ~65% Three-step decomposition is optimal

Key Findings

  • The three-step decomposition simplifies complex long-horizon tasks into learnable sub-tasks.
  • Collaboration-aware affordance improves performance by ~25% compared to geometric-only affordance.
  • Successful assembly of broken mugs in the real world demonstrates the feasibility of simulation-to-real (Sim-to-Real) transfer.

Highlights & Insights

  • Task decomposition mimics human intuition—humans naturally pick up, align, and then push fragments together; this decomposition is highly intuitive.
  • The advantages of affordance-based methods in geometric generalization are validated once again.
  • The real-world reproducible benchmark brings long-term value to this field.

Limitations & Future Work

  • Only two-fragment assembly is handled; scaling to multi-fragment assembly remains an open problem.
  • 3D perception of fragments depends heavily on the quality of point clouds.
  • Assembly direction prediction is still relatively simple; complex geometries may require more fine-grained planning.
  • vs. Pose-Only Prediction Methods: Ignores the manipulation process, rendering it unexecutable.
  • vs. Furniture Assembly: Fragments lack semantic labels, making the task significantly more challenging.

Rating

  • Novelty: ⭐⭐⭐⭐ Extending affordance to bimanual geometric assembly is a novel application.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensively evaluated across simulation and real-world environments with multiple object categories.
  • Writing Quality: ⭐⭐⭐⭐ Clear illustrations and reasonable task decomposition.
  • Value: ⭐⭐⭐⭐ Advances the practical feasibility of robotic fragment assembly.