Skip to main content

Showing 1–1 of 1 results for author: Moe, M P

.
  1. arXiv:2603.15483  [pdf, ps, other

    cs.AI

    Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis

    Authors: Penny Chong, Harshavardhan Abichandani, Jiyuan Shen, Atin Ghosh, Min Pyae Moe, Yifan Mai, Daniel Dahlmeier

    Abstract: Agent applications are increasingly adopted to automate workflows across diverse tasks. However, due to the heterogeneous domains they operate in, it is challenging to create a scalable evaluation framework. Prior works each employ their own methods to determine task success, such as database lookups, regex match, etc., adding complexity to the development of a unified agent evaluation approach. M… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Accepted as a conference paper at ICLR 2026. Code and dataset are available in the repository https://github.com/SAP-samples/agent-quality-inspect