Skip to main content

Showing 1–45 of 45 results for author: Zou, H P

.
  1. arXiv:2604.12285  [pdf, ps, other

    cs.AI

    GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

    Authors: Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang

    Abstract: To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often st… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: 18 pages, 6 figures

  2. arXiv:2604.03592  [pdf, ps, other

    cs.CL cs.AI

    Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

    Authors: Kening Zheng, Wei-Chieh Huang, Jiahao Huo, Zhonghao Li, Henry Peng Zou, Yibo Yan, Xin Zou, Jungang Li, Junzhuo Li, Hanrong Zhang, Xuming Hu, Philip S. Yu

    Abstract: Mixture-of-Experts (MoE) models exhibit striking performance disparities across languages, yet the internal mechanisms driving these gaps remain poorly understood. In this work, we conduct a systematic analysis of expert routing patterns in MoE models, revealing a phenomenon we term Language Routing Isolation, in which high- and low-resource languages tend to activate largely disjoint expert sets.… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  3. arXiv:2604.01687  [pdf, ps, other

    cs.AI

    CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

    Authors: Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng, Xue Liu, Xiaoxiao Li, Philip S. Yu

    Abstract: Anthropic proposes the concept of skills for LLM agents to tackle multi-step professional tasks that simple tool invocations cannot address. A tool is a single, self-contained function, whereas a skill is a structured bundle of interdependent multi-file artifacts. Currently, skill generation is not only label-intensive due to manual authoring, but also may suffer from human--machine cognitive misa… ▽ More

    Submitted 12 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

    Comments: Code will be released

  4. arXiv:2604.00892  [pdf, ps, other

    cs.CL

    When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

    Authors: Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang, Yaozu Wu, Liancheng Fang, Zhengyao Gu, Zhen Zhang, Kening Zheng, Fangxin Wang, Yi Nian, Shanghao Li, Wenzhe Fan, Langzhou He, Weizhi Zhang, Xue Liu, Philip S. Yu

    Abstract: As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing benchmarks largely assume uninterrupted agent behavior or study interruptions only in… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  5. arXiv:2604.00375  [pdf, ps, other

    cs.CL

    Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

    Authors: Liancheng Fang, Aiwei Liu, Henry Peng Zou, Yankai Chen, Enze Ma, Leyi Pan, Chunyu Miao, Wei-Chieh Huang, Xue Liu, Philip S. Yu

    Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, bu… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  6. arXiv:2604.00001  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Two-Stage Optimizer-Aware Online Data Selection for Large Language Models

    Authors: Fangxin Wang, Peyman Baghershahi, Langzhou He, Henry Peng Zou, Sourav Medya, Philip S. Yu

    Abstract: Gradient-based data selection offers a principled framework for estimating sample utility in large language model (LLM) fine-tuning, but existing methods are mostly designed for offline settings. They are therefore less suited to online fine-tuning, where data arrives sequentially, sample utility is step-dependent, and the effective update geometry is shaped by adaptive optimizers. We propose an o… ▽ More

    Submitted 8 March, 2026; originally announced April 2026.

    Comments: 22 pages, 2 figures, 6 tables

  7. arXiv:2603.27304  [pdf, ps, other

    cs.AI cs.MA

    EpochX: Building the Infrastructure for an Emergent Agent Civilization

    Authors: Huacan Wang, Chaofa Yuan, Xialie Zhuang, Tu Hu, Shuo Zhang, Jun Han, Shi Wei, Daiqiang Li, Jingping Liu, Kunyi Wang, Zihan Yin, Zhenheng Tang, Andy Wang, Henry Peng Zou, Philip S. Yu, Sen Hu, Qizhen Lan, Ronghao Chen

    Abstract: General-purpose technologies reshape economies less by improving individual tools than by enabling new ways to organize production and coordination. We believe AI agents are approaching a similar inflection point: as foundation models make broad task execution and tool use increasingly accessible, the binding constraint shifts from raw capability to how work is delegated, verified, and rewarded at… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

  8. arXiv:2603.17445  [pdf, ps, other

    cs.AI cs.CL

    When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution

    Authors: Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, Yue Zhao

    Abstract: When a multi-agent system produces an incorrect or harmful answer, who is accountable if execution logs and agent identifiers are unavailable? In practice, generated content is often detached from its execution environment due to privacy or system boundaries, leaving the final text as the only auditable artifact. Existing attribution methods rely on full execution traces and thus become ineffectiv… ▽ More

    Submitted 1 April, 2026; v1 submitted 18 March, 2026; originally announced March 2026.

  9. arXiv:2602.20532  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

    Authors: Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, Santiago Paternain, Philip S. Yu, Yisong Yue

    Abstract: Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this work, we propose ACTOR-CURATOR, a scalable and fully automated curriculum learning framework for reinforcement learning post-training of large language models (LLMs). ACTOR-CURATOR learns a neural curat… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

    Comments: 37 pages, 8 figures, 1 table. Preprint under review. Equal contribution by first two authors

    MSC Class: 68T05; 68T20; 90C40; 62L05 ACM Class: I.2.6; I.2.8; I.2.1; F.1.1

  10. arXiv:2602.12268  [pdf, ps, other

    cs.AI

    CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

    Authors: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang

    Abstract: AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforcement learning to such settings remains difficult: realistic objectives often lack verifiable rewards and instead emphasize open-ended behaviors; moreover, RL for multi-turn, multi-step agentic tool use is still underexplored; and building… ▽ More

    Submitted 20 February, 2026; v1 submitted 12 February, 2026; originally announced February 2026.

  11. arXiv:2602.06052  [pdf, ps, other

    cs.CL cs.AI

    Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

    Authors: Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, Shanglin Wu, Ruiyao Xu, Liangwei Yang, Rui Yang, Wooseong Yang, Chin-Yuan Yeh, Hanrong Zhang, Haozhen Zhang, Siqi Zhu, Henry Peng Zou, Wanjia Zhao, Song Wang, Wujiang Xu, Zixuan Ke, Zheng Hui , et al. (35 additional authors not shown)

    Abstract: The research of artificial intelligence is undergoing a paradigm shift from prioritizing model innovations over benchmark scores towards emphasizing problem definition and rigorous real-world evaluation. As the field enters the "second half," the central challenge becomes real utility in long-horizon, dynamic, and user-dependent environments, where agents face context explosion and must continuous… ▽ More

    Submitted 10 February, 2026; v1 submitted 14 January, 2026; originally announced February 2026.

  12. arXiv:2602.03688  [pdf, ps, other

    cs.AI

    TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System

    Authors: Wenzhe Fan, Tommaso Tognoli, Henry Peng Zou, Chunyu Miao, Yibo Wang, Xinhua Zhang

    Abstract: Multi-round LLM-based multi-agent systems rely on effective communication structures to support collaboration across rounds. However, most existing methods employ a fixed communication topology during inference, which falls short in many realistic applications where the agents' roles may change \textit{across rounds} due to dynamic adversary, task progression, or time-varying constraints such as c… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  13. arXiv:2511.21624  [pdf, ps, other

    cs.SI cs.CL

    TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs

    Authors: Kay Liu, Yuwei Han, Haoyan Xu, Henry Peng Zou, Yue Zhao, Philip S. Yu

    Abstract: Large Language Models (LLMs) have recently revolutionized machine learning on text-attributed graphs, but the application of LLMs to graph outlier detection, particularly in the context of fake news detection, remains significantly underexplored. One of the key challenges is the scarcity of large-scale, realistic, and well-annotated datasets that can serve as reliable benchmarks for outlier detect… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

    Comments: Preprint. Under review

  14. arXiv:2510.22095  [pdf, ps, other

    cs.AI cs.CL

    Embracing Trustworthy Brain-Agent Collaboration as Paradigm Extension for Intelligent Assistive Technologies

    Authors: Yankai Chen, Xinni Zhang, Yifei Zhang, Yangning Li, Henry Peng Zou, Chunyu Miao, Weizhi Zhang, Xue Liu, Philip S. Yu

    Abstract: Brain-Computer Interfaces (BCIs) offer a direct communication pathway between the human brain and external devices, holding significant promise for individuals with severe neurological impairments. However, their widespread adoption is hindered by critical limitations, such as low information transfer rates and extensive user-specific calibration. To overcome these challenges, recent research has… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS'25 Position Track

  15. arXiv:2510.10994  [pdf, ps, other

    cs.CL cs.AI

    Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety

    Authors: Wei-Chieh Huang, Henry Peng Zou, Yaozu Wu, Dongyuan Li, Yankai Chen, Weizhi Zhang, Yangning Li, Angelo Zangari, Jizhou Guo, Chunyu Miao, Liancheng Fang, Langzhou He, Yinghui Li, Renhe Jiang, Philip S. Yu

    Abstract: Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question… ▽ More

    Submitted 24 January, 2026; v1 submitted 13 October, 2025; originally announced October 2025.

  16. arXiv:2510.06186  [pdf, ps, other

    cs.CL cs.AI

    RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

    Authors: Chunyu Miao, Henry Peng Zou, Yangning Li, Yankai Chen, Yibo Wang, Fangxin Wang, Yifan Li, Wooseong Yang, Bowei He, Xinni Zhang, Dianzhi Yu, Hanchen Yang, Hoang H Nguyen, Yue Zhou, Jie Yang, Jizhou Guo, Wenzhe Fan, Chin-Yuan Yeh, Panpan Meng, Liancheng Fang, Jinhu Qi, Wei-Chieh Huang, Zhengyao Gu, Yuwei Han, Langzhou He , et al. (6 additional authors not shown)

    Abstract: Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE-H, a benchmark of 102 tasks from… ▽ More

    Submitted 24 October, 2025; v1 submitted 7 October, 2025; originally announced October 2025.

    Comments: Code and dataset are available at github.com/ChunyuMiao98/RECODE

  17. arXiv:2509.23614  [pdf, ps, other

    cs.AI

    PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents

    Authors: Yaozu Wu, Jizhou Guo, Dongyuan Li, Henry Peng Zou, Wei-Chieh Huang, Yankai Chen, Zhen Wang, Weizhi Zhang, Yangning Li, Meng Zhang, Renhe Jiang, Philip S. Yu

    Abstract: Effective guardrails are essential for safely deploying LLM-based agents in critical applications. Despite recent advances, existing guardrails suffer from two fundamental limitations: (i) they apply uniform guardrail policies to all users, ignoring that the same agent behavior can harm some users while being safe for others; (ii) they check each response in isolation, missing how risks evolve and… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  18. arXiv:2507.13336  [pdf, ps, other

    cs.IR

    SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henrry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

    Abstract: Recommender systems (RecSys) are essential for online platforms, providing personalized suggestions to users within a vast sea of information. Self-supervised graph learning seeks to harness high-order collaborative filtering signals through unsupervised augmentation on the user-item bipartite graph, primarily leveraging a multi-task learning framework that includes both supervised recommendation… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Accepted in RecSys 2025. arXiv admin note: substantial text overlap with arXiv:2404.15954

  19. arXiv:2507.09477  [pdf, ps, other

    cs.CL cs.AI

    Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

    Authors: Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Chunkit Chan, Yankai Chen, Zhongfen Deng, Yinghui Li, Hai-Tao Zheng, Dongyuan Li, Renhe Jiang, Ming Zhang, Yangqiu Song, Philip S. Yu

    Abstract: Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-retrieval perspective. We first map how advanced reasoning op… ▽ More

    Submitted 16 July, 2025; v1 submitted 12 July, 2025; originally announced July 2025.

    Comments: submitted to ARR May

  20. arXiv:2506.18959  [pdf, ps, other

    cs.IR cs.CL cs.LG

    From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

    Authors: Weizhi Zhang, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie, Yuyao Yang, Wei-Chieh Huang, Chunyu Miao, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Yankai Chen, Chunkit Chan, Peilin Zhou, Xinyang Zhang, Chenwei Zhang, Jingbo Shang, Ming Zhang, Yangqiu Song, Irwin King, Philip S. Yu

    Abstract: Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm terme… ▽ More

    Submitted 3 July, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  21. arXiv:2506.09420  [pdf, ps, other

    cs.AI cs.CL cs.HC cs.LG cs.MA

    A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

    Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu

    Abstract: Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  22. arXiv:2506.06254  [pdf, ps, other

    cs.AI cs.CL cs.LG

    PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time

    Authors: Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li

    Abstract: Large Language Model (LLM) empowered agents have recently emerged as advanced paradigms that exhibit impressive capabilities in a wide range of domains and tasks. Despite their potential, current LLM agents often adopt a one-size-fits-all approach, lacking the flexibility to respond to users' varying needs and preferences. This limitation motivates us to develop PersonaAgent, the first personalize… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  23. arXiv:2505.24267  [pdf, ps, other

    cs.CR

    MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection

    Authors: Liancheng Fang, Aiwei Liu, Henry Peng Zou, Yankai Chen, Hengrui Zhang, Zhongfen Deng, Philip S. Yu

    Abstract: We introduce MUSE, a watermarking algorithm for tabular generative models. Previous approaches typically leverage DDIM invertibility to watermark tabular diffusion models, but tabular diffusion models exhibit significantly poorer invertibility compared to other modalities, compromising performance. Simultaneously, tabular diffusion models require substantially less computation than other modalitie… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  24. arXiv:2505.00753  [pdf, ps, other

    cs.CL cs.LG

    LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

    Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Dongyuan Li, Renhe Jiang, Xue Liu, Philip S. Yu

    Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world app… ▽ More

    Submitted 26 June, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems

  25. arXiv:2503.03062  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations

    Authors: Zhengyao Gu, Henry Peng Zou, Yankai Chen, Aiwei Liu, Weizhi Zhang, Philip S. Yu

    Abstract: The high cost of obtaining high-quality annotated data for in-context learning (ICL) has motivated the development of methods that use self-generated annotations in place of ground-truth labels. While these approaches have shown promising results in few-shot settings, they generally do not scale to many-shot scenarios. In this work, we study ICL with self-generated examples using a framework analo… ▽ More

    Submitted 20 May, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  26. arXiv:2503.01814  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.LG

    LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Wooseong Yang, Henry Peng Zou, Yuqing Liu, Ke Xu, Sourav Medya, Philip S. Yu

    Abstract: Collaborative filtering (CF) is widely adopted in industrial recommender systems (RecSys) for modeling user-item interactions across numerous applications, but often struggles with cold-start and data-sparse scenarios. Recent advancements in pre-trained large language models (LLMs) with rich semantic knowledge, offer promising solutions to these challenges. However, deploying LLMs at scale is hind… ▽ More

    Submitted 20 November, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted in EMNLP 2025 Industry Track

  27. arXiv:2502.19163  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency

    Authors: Henry Peng Zou, Zhengyao Gu, Yue Zhou, Yankai Chen, Weizhi Zhang, Liancheng Fang, Yibo Wang, Yangning Li, Kay Liu, Philip S. Yu

    Abstract: Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only th… ▽ More

    Submitted 31 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: Accepted by ACL 2025 main conference

  28. arXiv:2502.18414  [pdf, other

    cs.CL cs.LG

    GLEAN: Generalized Category Discovery with Diverse and Quality-Enhanced LLM Feedback

    Authors: Henry Peng Zou, Siffi Singh, Yi Nian, Jianfeng He, Jason Cai, Saab Mansour, Hang Su

    Abstract: Generalized Category Discovery (GCD) is a practical and challenging open-world task that aims to recognize both known and novel categories in unlabeled data using limited labeled data from known categories. Due to the lack of supervision, previous GCD methods face significant challenges, such as difficulty in rectifying errors for confusing instances, and inability to effectively uncover and lever… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  29. arXiv:2502.16804  [pdf, ps, other

    cs.MA cs.AI

    Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances

    Authors: Yaozu Wu, Dongyuan Li, Yankai Chen, Renhe Jiang, Henry Peng Zou, Wei-Chieh Huang, Yangning Li, Liancheng Fang, Zhen Wang, Philip S. Yu

    Abstract: Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs) have been integrated into ADSs to support high-level decision-making through their powerful reasoning, instruction-following, and communication abilities. However, LLM-based single-agent ADSs face three major chall… ▽ More

    Submitted 14 October, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

  30. arXiv:2502.16414  [pdf, other

    cs.LG cs.AI

    TabGen-ICL: Residual-Aware In-Context Example Selection for Tabular Data Generation

    Authors: Liancheng Fang, Aiwei Liu, Hengrui Zhang, Henry Peng Zou, Weizhi Zhang, Philip S. Yu

    Abstract: Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  31. arXiv:2501.01945  [pdf, other

    cs.IR cs.AI

    Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap

    Authors: Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu

    Abstract: Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large langu… ▽ More

    Submitted 16 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  32. arXiv:2411.13578  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CMOOD: Concept-based Multi-label OOD Detection

    Authors: Zhendong Liu, Yi Nian, Yuehan Qin, Henry Peng Zou, Li Li, Xiyang Hu, Yue Zhao

    Abstract: How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large languag… ▽ More

    Submitted 28 January, 2026; v1 submitted 15 November, 2024; originally announced November 2024.

  33. arXiv:2410.11327  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Sequential LLM Framework for Fashion Recommendation

    Authors: Han Liu, Xianfeng Tang, Tianlang Chen, Jiapeng Liu, Indu Indu, Henry Peng Zou, Peng Dai, Roberto Fernandez Galan, Michael D Porter, Dongmei Jia, Ning Zhang, Lian Xiong

    Abstract: The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we prop… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  34. arXiv:2409.09927  [pdf, other

    cs.CL cs.AI

    Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges

    Authors: Vinay Samuel, Yue Zhou, Henry Peng Zou

    Abstract: As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state… ▽ More

    Submitted 8 December, 2024; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: Accepted to COLING 2025 12 pages, 1 figure

  35. arXiv:2407.18910  [pdf, other

    cs.LG cs.IR

    Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Liancheng Fang, Philip S. Yu

    Abstract: The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equ… ▽ More

    Submitted 28 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2024

  36. arXiv:2407.18416  [pdf, ps, other

    cs.CL cs.AI cs.LG

    PersonaGym: Evaluating Persona Agents and LLMs

    Authors: Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari

    Abstract: Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user aligned interactions across domains like education and healthcare. However, evaluating how faithfully these agents adhere to their personas remains a significant challenge, particularly in free-form settings that demand consistency across diverse, persona-relevant environment… ▽ More

    Submitted 5 September, 2025; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: EMNLP Findings 2025

  37. arXiv:2407.00869  [pdf, other

    cs.CL cs.AI

    Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks

    Authors: Yue Zhou, Henry Peng Zou, Barbara Di Eugenio, Yang Zhang

    Abstract: We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious y… ▽ More

    Submitted 22 May, 2025; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to the main conference of EMNLP 2024

  38. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 2 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EMNLP 2024 main conference

  39. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey

    Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Yichen Wang, Kuofeng Gao, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Shenghao Wu, Zongxing Xie, Weimin Lyu, Sihong He, Lu Cheng, Haohan Wang, Jun Zhuang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 21 October, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  40. arXiv:2404.15954  [pdf, other

    cs.IR cs.LG

    Mixed Supervised Graph Contrastive Learning for Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

    Abstract: Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  41. arXiv:2404.15592  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

    Authors: Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea

    Abstract: Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.… ▽ More

    Submitted 19 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024 (Findings) - Scores: Soundness - 4/4/4, Dataset - 4/4/4, Overall Assessment - 4/3.5/3.5, Meta - 4

  42. arXiv:2404.08886  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

    Authors: Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

    Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024 Industry Track

  43. arXiv:2310.14627  [pdf, other

    cs.CL cs.LG

    CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification

    Authors: Henry Peng Zou, Yue Zhou, Cornelia Caragea, Doina Caragea

    Abstract: The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge,… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by ISCRAM 2023

  44. arXiv:2310.14583  [pdf, other

    cs.CL cs.LG

    JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification

    Authors: Henry Peng Zou, Cornelia Caragea

    Abstract: Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. In this paper, we propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning an… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 (Main)

  45. arXiv:2310.14577  [pdf, other

    cs.CL cs.LG

    DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

    Authors: Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea

    Abstract: During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approache… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 (Findings)