Computer Science > Artificial Intelligence

arXiv:2510.07978 (cs)

[Submitted on 9 Oct 2025 (v1), last revised 13 Feb 2026 (this version, v3)]

Title:VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

Authors:Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal

Abstract:Large scale Speech Language Models have enabled voice assistants capable of understanding natural spoken queries and performing complex tasks. However, existing speech benchmarks largely focus on isolated capabilities such as transcription or question answering and do not systematically evaluate agentic behavior or adversarial robustness. To address this, we introduce VoiceAgentBench, a comprehensive benchmark for evaluating SpeechLMs in realistic spoken agentic settings, comprising 6,000+ synthetic spoken queries spanning single-tool invocations, multi-tool workflows, multi-turn dialogue, and safety evaluations across English and six Indic languages. To ensure speaker diversity, we further simulate speaker variability using a novel sampling strategy that selects audios for TTS voice conversion based on speaker embeddings to maximize acoustic diversity. Our evaluation measures tool selection accuracy, structural consistency, and the correctness of tool invocations, including adversarial robustness. Across agentic tasks, ASR-LLM pipelines outperform end-to-end SpeechLMs, achieving up to 60.6% average parameter-filling accuracy on English, while SpeechLMs exhibit lower performance and sharper degradation on Indic languages. All models struggle in sequential workflows and safety evaluations, highlighting persistent limitations in tool orchestration, multilingual generalization, and safety robustness. VoiceAgentBench is publicly available on Hugging Face at this https URL, and the codebase is released at this https URL.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2510.07978 [cs.AI]
	(or arXiv:2510.07978v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2510.07978

Submission history

From: Dhruv Jain [view email]
[v1] Thu, 9 Oct 2025 09:11:38 UTC (10,407 KB)
[v2] Wed, 5 Nov 2025 07:44:45 UTC (10,398 KB)
[v3] Fri, 13 Feb 2026 12:58:41 UTC (10,456 KB)

Computer Science > Artificial Intelligence

Title:VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators