🧑 Human Understanding¶

🧪 ICML2025 · 3 paper notes

How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects: This paper presents a unified framework for text-driven motion generation targetting large-vocabulary heterogeneous skeletal objects, achieved by annotating text descriptions for the Truebones Zoo dataset (70+ species), introducing rig augmentation, and integrating TreePE and RestPE encodings into the Motion Diffusion Model. It enables high-quality 3D motion synthesis for animals, dinosaurs, and even fictional creatures.
LLaVA-ReID: Selective Multi-Image Questioner for Interactive Person Re-Identification: This paper defines a new task of interactive person re-identification (Inter-ReID), constructs the Interactive-PEDES multi-turn dialogue dataset, and proposes LLaVA-ReID—a large multimodal question generation model based on selective multi-image context and look-ahead supervision, which progressively refines target person descriptions through iterative dialogue.
Scaling Large Motion Models with Million-Level Human Motions: This paper introduces MotionLib (the first million-level motion dataset, containing 1.2 million sequences), MotionBook (comprising lossless features and a 2D lookup-free motion tokenizer), and Being-M0 (a large motion model), demonstrating the scaling laws of both data and model size in the motion generation field for the first time.