Computer Science > Computer Vision and Pattern Recognition

arXiv:2512.07410 (cs)

[Submitted on 8 Dec 2025 (v1), last revised 12 Dec 2025 (this version, v2)]

Title:InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

Authors:Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Lan Xu, Jingyi Yu, Jingya Wang

Abstract:Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the physically plausible interplay essential for multi-agent interactions. To bridge this gap, we propose InterAgent, the first end-to-end framework for text-driven physics-based multi-agent humanoid control. At its core, we introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to mitigate cross-modal interference while enabling synergistic coordination. We further propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies to facilitate network learning. Additionally, within it we devise a sparse edge-based attention mechanism that dynamically prunes redundant connections and emphasizes critical inter-agent spatial relations, thereby enhancing the robustness of interaction modeling. Extensive experiments demonstrate that InterAgent consistently outperforms multiple strong baselines, achieving state-of-the-art performance. It enables producing coherent, physically plausible, and semantically faithful multi-agent behaviors from only text prompts. Our code and data will be released to facilitate future research.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2512.07410 [cs.CV]
	(or arXiv:2512.07410v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2512.07410

Submission history

From: Bin Li [view email]
[v1] Mon, 8 Dec 2025 10:46:01 UTC (4,138 KB)
[v2] Fri, 12 Dec 2025 09:00:52 UTC (4,138 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators