Skip to content

🧑 Human Understanding

🧪 ICML2025 · 3 paper notes

📌 Same area in other venues: 📷 CVPR2026 (151) · 🔬 ICLR2026 (45) · 🧪 ICML2026 (5) · 🤖 AAAI2026 (20) · 🧠 NeurIPS2025 (21) · 📹 ICCV2025 (41)

How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects

This paper presents a unified framework for text-driven motion generation targetting large-vocabulary heterogeneous skeletal objects, achieved by annotating text descriptions for the Truebones Zoo dataset (70+ species), introducing rig augmentation, and integrating TreePE and RestPE encodings into the Motion Diffusion Model. It enables high-quality 3D motion synthesis for animals, dinosaurs, and even fictional creatures.

LLaVA-ReID: Selective Multi-Image Questioner for Interactive Person Re-Identification

This paper defines a new task of interactive person re-identification (Inter-ReID), constructs the Interactive-PEDES multi-turn dialogue dataset, and proposes LLaVA-ReID—a large multimodal question generation model based on selective multi-image context and look-ahead supervision, which progressively refines target person descriptions through iterative dialogue.

Scaling Large Motion Models with Million-Level Human Motions

This paper introduces MotionLib (the first million-level motion dataset, containing 1.2 million sequences), MotionBook (comprising lossless features and a 2D lookup-free motion tokenizer), and Being-M0 (a large motion model), demonstrating the scaling laws of both data and model size in the motion generation field for the first time.