Skip to main content

Showing 1–50 of 424 results for author: Yao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.10971  [pdf, ps, other

    cs.CV cs.AI

    MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

    Authors: Xincheng Yao, Zefeng Qian, Chao Shi, Jiayang Song, Chongyang Zhang

    Abstract: In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPR2026

  2. arXiv:2604.09162  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events

    Authors: Yuqin Yang, Haowu Zhou, Haoran Tu, Zhiwen Hui, Shiqi Yan, HaoYang Li, Dong She, Xianrong Yao, Yang Gao, Zhanpeng Jin

    Abstract: Most affective computing research treats emotion as a static property of text, focusing on the writer's sentiment while overlooking the reader's perspective. This approach ignores how individual personalities lead to diverse emotional appraisals of the same event. Although role-playing Large Language Models (LLMs) attempt to simulate such nuanced reactions, they often suffer from "personality illu… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

    Comments: Accepted by ACL 2026 Main

  3. arXiv:2604.08690  [pdf, ps, other

    cs.LG cs.CL

    Skip-Connected Policy Optimization for Implicit Advantage

    Authors: Fengwei Teng, Jinyi Bai, Xinhao Yao, Demi Ruohan Wang, Jiahao Zhao, Zhijiang Guo

    Abstract: Group Relative Policy Optimization (GRPO) has proven effective in RLVR by using outcome-based rewards. While fine-grained dense rewards can theoretically improve performance, we reveal that under practical sampling budgets, Monte Carlo estimation yields high-variance and sign-inconsistent advantages for early reasoning tokens, paradoxically underperforming outcome-only GRPO. We propose Skip-Connec… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  4. arXiv:2604.05900  [pdf, ps, other

    cs.CV

    AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis

    Authors: Dong She, Xianrong Yao, Liqun Chen, Jinghe Yu, Yang Gao, Zhanpeng Jin

    Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA), which integrates perception, reasoning, and generation into a unified framework, remains underexplored. To address this gap, we introduce AICA-Bench, a comprehensive benchmark with three core tasks: Emotion Understanding (EU), Emotion Reasoning (ER), and Emotion-… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: Accepted by Findings of ACL 2026

  5. Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

    Authors: Leixin Chang, Xinchen Yao, Ben Liu, Liangjing Yang, Hua Chen

    Abstract: On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state… ▽ More

    Submitted 1 April, 2026; v1 submitted 28 March, 2026; originally announced March 2026.

    Comments: 8 pages, 10 figures

  6. arXiv:2603.26078  [pdf, ps, other

    cs.CV cs.AI

    When Identities Collapse: A Stress-Test Benchmark for Multi-Subject Personalization

    Authors: Zhihan Chen, Yuhuan Zhao, Yijie Zhu, Xinyu Yao

    Abstract: Subject-driven text-to-image diffusion models have achieved remarkable success in preserving single identities, yet their ability to compose multiple interacting subjects remains largely unexplored and highly challenging. Existing evaluation protocols typically rely on global CLIP metrics, which are insensitive to local identity collapse and fail to capture the severity of multi-subject entangleme… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 10 pages, 7 figures, accepted by CVPR 2026 Workshop P13N

  7. arXiv:2603.24059  [pdf, ps, other

    cs.CV

    AD-Reasoning: Multimodal Guideline-Guided Reasoning for Alzheimer's Disease Diagnosis

    Authors: Qiuhui Chen, Yushan Deng, Xuancheng Yao, Yi Hong

    Abstract: Alzheimer's disease (AD) diagnosis requires integrating neuroimaging with heterogeneous clinical evidence and reasoning under established criteria, yet most multimodal models remain opaque and weakly guideline-aligned. We present AD-Reasoning, a multimodal framework that couples structural MRI with six clinical modalities and a rule-based verifier to generate structured, NIA-AA-consistent diagnose… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

    Comments: ICME 2026

  8. arXiv:2603.22317  [pdf, ps, other

    cs.LG cs.AI

    Geometric Mixture-of-Experts with Curvature-Guided Adaptive Routing for Graph Representation Learning

    Authors: Haifang Cao, Yu Wang, Timing Li, Xinjie Yao, Pengfei Zhu

    Abstract: Graph-structured data typically exhibits complex topological heterogeneity, making it difficult to model accurately within a single Riemannian manifold. While emerging mixed-curvature methods attempt to capture such diversity, they often rely on implicit, task-driven routing that lacks fundamental geometric grounding. To address this challenge, we propose a Geometric Mixture-of-Experts framework (… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  9. arXiv:2603.21017  [pdf, ps, other

    cs.RO

    Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness

    Authors: Ziou Hu, Xiangtong Yao, Yuan Meng, Zhenshan Bing, Alois Knoll

    Abstract: Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder.… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: Under review

  10. arXiv:2603.19938  [pdf, ps, other

    cs.IT

    Capacity-Achieving BBT Polar Codes with Interleaver-Assisted BP Decoding

    Authors: Xinyuanmeng Yao, Xiao Ma

    Abstract: In this paper, we introduce a binary balanced tree (BBT) channel transformation that extends Arıkan's channel transformation to arbitrary block lengths. We prove that the proposed transformation induces channel polarization, thereby establishing that BBT polar codes achieve the capacity of binary-input memoryless symmetric (BMS) channels. To characterize the finite-length performance of BBT polar… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: 33 pages, 13 figures

  11. Using Laplace Transform To Optimize the Hallucination of Generation Models

    Authors: Cheng Kang, Xinye Chen, Daniel Novak, Xujing Yao

    Abstract: To explore the feasibility of avoiding the confident error (or hallucination) of generation models (GMs), we formalise the system of GMs as a class of stochastic dynamical systems through the lens of control theory. Numerous factors can be attributed to the hallucination of the learning process of GMs, utilising knowledge of control theory allows us to analyse their system functions and system res… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: Corresponding author: Xujing Yao (xjyao@njtech.edu.cn)

    Journal ref: In 2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV) (pp. 447-453). IEEE

  12. arXiv:2603.14177  [pdf, ps, other

    cs.LG cs.AI

    Artificial intelligence-enabled single-lead ECG for non-invasive hyperkalemia detection: development, multicenter validation, and proof-of-concept deployment

    Authors: Gongzheng Tang, Qinghao Zhao, Guangkun Nie, Yujie Xiao, Shijia Geng, Donglin Xie, Shun Huang, Deyun Zhang, Xingchen Yao, Jinwei Wang, Kangyin Chen, Luxia Zhang, Shenda Hong

    Abstract: Hyperkalemia is a life-threatening electrolyte disorder that is common in patients with chronic kidney disease and heart failure, yet frequent monitoring remains difficult outside hospital settings. We developed and validated Pocket-K, a single-lead AI-ECG system initialized from the ECGFounder foundation model for non-invasive hyperkalemia screening and handheld deployment. In this multicentre ob… ▽ More

    Submitted 17 March, 2026; v1 submitted 14 March, 2026; originally announced March 2026.

  13. arXiv:2603.13961  [pdf, ps, other

    cs.CV

    USIS-PGM: Photometric Gaussian Mixtures for Underwater Salient Instance Segmentation

    Authors: Lin Hong, Xiangtong Yao, Mürüvvet Bozkurt, Xin Wang, Fumin Zhang

    Abstract: Underwater salient instance segmentation (USIS) is crucial for marine robotic systems, as it enables both underwater salient object detection and instance-level mask prediction for visual scene understanding. Compared with its terrestrial counterpart, USIS is more challenging due to the underwater image degradation. To address this issue, this paper proposes USIS-PGM, a single-stage framework for… ▽ More

    Submitted 16 March, 2026; v1 submitted 14 March, 2026; originally announced March 2026.

  14. arXiv:2603.08957  [pdf, ps, other

    cs.MS cs.AI cs.DB

    Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation

    Authors: Yuxin Tang, Zhiyuan Xin, Zhimin Ding, Xinyu Yao, Daniel Bourgeois, Tirthak Patel, Chris Jermaine

    Abstract: A \emph{tensor-relational} computation is a relational computation where individual tuples carry vectors, matrices, or higher-dimensional arrays. An advantage of tensor-relational computation is that the overall computation can be executed on top of a relational system, inheriting the system's ability to automatically handle very large inputs with high levels of sparsity while high-performance ker… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  15. arXiv:2603.07685  [pdf, ps, other

    cs.DC cs.CL cs.LG

    Scalable Training of Mixture-of-Experts Models with Megatron Core

    Authors: Zijie Yan, Hongxiao Bai, Xin Yao, Dennis Liu, Tong Liu, Hongbin Liu, Pingtian Li, Evan Wu, Shiqing Fan, Li Tao, Robin Zhang, Yuzhong Wang, Shifang Xu, Jack Chang, Xuwen Chen, Kunlun Li, Yan Bai, Gao Deng, Nan Zheng, Vijay Anand Korthikanti, Abhinav Khattar, Ethan He, Soham Govande, Sangkug Lym, Zhongbo Zhu , et al. (20 additional authors not shown)

    Abstract: Scaling Mixture-of-Experts (MoE) training introduces systems challenges absent in dense models. Because each token activates only a subset of experts, this sparsity allows total parameters to grow much faster than per-token computation, creating coupled constraints across memory, communication, and computation. Optimizing one dimension often shifts pressure to another, demanding co-design across t… ▽ More

    Submitted 10 March, 2026; v1 submitted 8 March, 2026; originally announced March 2026.

    Comments: Technical Report. 88 pages. 42 figures

  16. Adversarial Batch Representation Augmentation for Batch Correction in High-Content Cellular Screening

    Authors: Lei Tong, Xujing Yao, Adam Corrigan, Long Chen, Navin Rathna Kumar, Kerry Hallbrook, Jonathan Orme, Yinhai Wang, Huiyu Zhou

    Abstract: High-Content Screening routinely generates massive volumes of cell painting images for phenotypic profiling. However, technical variations across experimental executions inevitably induce biological batch (bio-batch) effects. These cause covariate shifts and degrade the generalization of deep learning models on unseen data. Existing batch correction methods typically rely on additional prior knowl… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: Preprint

    Journal ref: Knowledge-based Systems, 2026

  17. arXiv:2603.03580  [pdf, ps, other

    cs.CV

    An Effective Data Augmentation Method by Asking Questions about Scene Text Images

    Authors: Xu Yao, Lei Kang

    Abstract: Scene text recognition (STR) and handwritten text recognition (HTR) face significant challenges in accurately transcribing textual content from images into machine-readable formats. Conventional OCR models often predict transcriptions directly, which limits detailed reasoning about text structure. We propose a VQA-inspired data augmentation framework that strengthens OCR training through structure… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: Accepted to ICASSP 2026

  18. arXiv:2603.00031  [pdf, ps, other

    cs.CL cs.LG

    GRIP: Geometric Refinement and Adaptive Information Potential for Data Efficiency

    Authors: Changhao Wang, Jiaolong Yang, Xinhao Yao, Yunfei Yu, Peng Jiao, Lu Yu, Junpeng Fang, Riccardo Cantoro, Qing Cui, Jun Zhou

    Abstract: The performance of Large Language Models (LLMs) is increasingly governed by data efficiency rather than raw scaling volume. However, existing selection methods often decouple global distribution balancing from local instance selection, compromising the hierarchical integrity of the training set. We introduce \textbf{GRIP} (Geometric Refinement and Adaptive Information Potential), a framework that… ▽ More

    Submitted 4 February, 2026; originally announced March 2026.

  19. arXiv:2602.23978  [pdf, ps, other

    cs.IR

    Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer

    Authors: Huimu Wang, Xingzhi Yao, Yiming Qiu, Qinghong Zhang, Haotian Wang, Yufan Cui, Songlin Wang, Sulong Xu, Mingming Li

    Abstract: While semantic ID-based generative retrieval enables efficient end-to-end modeling in industrial applications, these methods face a persistent trade-off: head items are susceptible to ID collisions that negatively impact downstream tasks, whereas data-sparse tail items, including cold-start items, exhibit limited generalization. To address this issue, we propose the Anchored Curriculum with Sequen… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

    ACM Class: E.3.3

  20. arXiv:2602.23964  [pdf, ps, other

    cs.IR

    RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

    Authors: Zhiguo Chen, Guohao Sun, Yiming Qiu, Xingzhi Yao, Mingming Li, Huimu Wang, Yangqi Zhang, Songlin Wang, Sulong Xu

    Abstract: Generative Retrieval (GR) has emerged as a powerful paradigm in e-commerce search, retrieving items via autoregressive decoding of Semantic IDs (SIDs). However, aligning GR with complex user preferences remains challenging. While Direct Preference Optimization (DPO) offers an efficient alignment solution, its direct application to structured SIDs suffers from three limitations: (i) it penalizes sh… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

    ACM Class: H.3.3

  21. arXiv:2602.19178  [pdf, ps, other

    cs.CV

    EMAD: Evidence-Centric Grounded Multimodal Diagnosis for Alzheimer's Disease

    Authors: Qiuhui Chen, Xuancheng Yao, Zhenglei Zhou, Xinyue Hu, Yi Hong

    Abstract: Deep learning models for medical image analysis often act as black boxes, seldom aligning with clinical guidelines or explicitly linking decisions to supporting evidence. This is especially critical in Alzheimer's disease (AD), where predictions should be grounded in both anatomical and clinical findings. We present EMAD, a vision-language framework that generates structured AD diagnostic reports… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

    Comments: Accepted by CVPR2026

  22. arXiv:2602.14965  [pdf, ps, other

    cs.CV cs.RO

    PAct: Part-Decomposed Single-View Articulated Object Generation

    Authors: Qingming Liu, Xinyue Yao, Shuyuan Zhang, Yueci Deng, Guiliang Liu, Zhen Liu, Kui Jia

    Abstract: Articulated objects are central to interactive 3D applications, including embodied AI, robotics, and VR/AR, where functional part decomposition and kinematic motion are essential. Yet producing high-fidelity articulated assets remains difficult to scale because it requires reliable part decomposition and kinematic rigging. Existing approaches largely fall into two paradigms: optimization-based rec… ▽ More

    Submitted 16 February, 2026; originally announced February 2026.

    Comments: Technical Report(11 figures, 14 pages), Project Page: https://PAct-project.github.io

  23. arXiv:2602.07007  [pdf, ps, other

    cs.RO

    ARGOS: Automated Functional Safety Requirement Synthesis for Embodied AI via Attribute-Guided Combinatorial Reasoning

    Authors: Dongsheng Chen, Yuxuan Li, Yi Lin, Guanhua Chen, Jiaxin Zhang, Xiangyu Zhao, Lei Ma, Xin Yao, Xuetao Wei

    Abstract: Ensuring functional safety is essential for the deployment of Embodied AI in complex open-world environments. However, traditional Hazard Analysis and Risk Assessment (HARA) methods struggle to scale in this domain. While HARA relies on enumerating risks for finite and pre-defined function lists, Embodied AI operates on open-ended natural language instructions, creating a challenge of combinatoria… ▽ More

    Submitted 30 January, 2026; originally announced February 2026.

  24. arXiv:2602.03772  [pdf, ps, other

    cs.LG cs.AI

    UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining

    Authors: Changhao Wang, Yunfei Yu, Xinhao Yao, Jiaolong Yang, Riccardo Cantoro, Chaobo Li, Qing Cui, Jun Zhou

    Abstract: The scaling of Large Language Models (LLMs) is increasingly limited by data quality. Most methods handle data mixing and sample selection separately, which can break the structure in code corpora. We introduce \textbf{UniGeM}, a framework that unifies mixing and selection by treating data curation as a \textit{manifold approximation} problem without training proxy models or relying on external ref… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  25. arXiv:2602.03547  [pdf, ps, other

    cs.RO cs.CV

    AffordanceGrasp-R1:Leveraging Reasoning-Based Affordance Segmentation with Reinforcement Learning for Robotic Grasping

    Authors: Dingyi Zhou, Mu He, Zhuowei Fang, Xiangtong Yao, Yinlong Liu, Alois Knoll, Hu Cao

    Abstract: We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learning to enhance deduction and spatial grounding. In addition, we redesign the grasping pipeline to be more context-aware by generating grasp candidates from the global scene point cloud and subsequently filterin… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: Preprint version

  26. arXiv:2601.22509  [pdf, ps, other

    cs.LG cs.AI

    Keep Rehearsing and Refining: Lifelong Learning Vehicle Routing under Continually Drifting Tasks

    Authors: Jiyuan Pei, Yi Mei, Jialin Liu, Mengjie Zhang, Xin Yao

    Abstract: Existing neural solvers for vehicle routing problems (VRPs) are typically trained either in a one-off manner on a fixed set of pre-defined tasks or in a lifelong manner on several tasks arriving sequentially, assuming sufficient training on each task. Both settings overlook a common real-world property: problem patterns may drift continually over time, yielding massive tasks sequentially arising w… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  27. arXiv:2601.19481  [pdf, ps, other

    cs.NE cs.LG

    Posterior Distribution-assisted Evolutionary Dynamic Optimization as an Online Calibrator for Complex Social Simulations

    Authors: Peng Yang, Zhenhua Yang, Boquan Jiang, Chenkai Wang, Ke Tang, Xin Yao

    Abstract: The calibration of simulators for complex social systems aims to identify the optimal parameter that drives the output of the simulator best matching the target data observed from the system. As many social systems may change internally over time, calibration naturally becomes an online task, requiring parameters to be updated continuously to maintain the simulator's fidelity. In this work, the on… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

  28. arXiv:2601.15657  [pdf, ps, other

    cs.LG cs.AI

    Integrating Knowledge Distillation Methods: A Sequential Multi-Stage Framework

    Authors: Yinxi Tian, Changwu Huang, Ke Tang, Xin Yao

    Abstract: Knowledge distillation (KD) transfers knowledge from large teacher models to compact student models, enabling efficient deployment on resource constrained devices. While diverse KD methods, including response based, feature based, and relation based approaches, capture different aspects of teacher knowledge, integrating multiple methods or knowledge sources is promising but often hampered by compl… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  29. arXiv:2601.13683  [pdf, ps, other

    cs.CV

    Dynamic Differential Linear Attention: Enhancing Linear Diffusion Transformer for High-Quality Image Generation

    Authors: Boyuan Cao, Xingbo Yao, Chenhui Wang, Jiaxin Ye, Yujie Wei, Hongming Shan

    Abstract: Diffusion transformers (DiTs) have emerged as a powerful architecture for high-fidelity image generation, yet the quadratic cost of self-attention poses a major scalability bottleneck. To address this, linear attention mechanisms have been adopted to reduce computational cost; unfortunately, the resulting linear diffusion transformers (LiTs) models often come at the expense of generative performan… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

  30. arXiv:2601.13137  [pdf, ps, other

    cs.CL

    Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains

    Authors: Yuan Gao, Zhigang Liu, Xinyu Yao, Bo Chen, Xiaobing Zhao

    Abstract: With the wide application of large language models (LLMs), the problems of bias and value inconsistency in sensitive domains have gradually emerged, especially in terms of race, society and politics. In this paper, we propose an adversarial alignment framework, which enhances the value consistency of the model in sensitive domains through continued pre-training, instruction fine-tuning and adversa… ▽ More

    Submitted 22 January, 2026; v1 submitted 19 January, 2026; originally announced January 2026.

    Comments: 13 pages, 5 figures

  31. arXiv:2601.10365  [pdf, ps, other

    cs.RO

    FastStair: Learning to Run Up Stairs with Humanoid Robots

    Authors: Yan Liu, Tao Yu, Haolin Song, Hongbo Zhu, Nianzong Hu, Yuzhi Hao, Xiuyong Yao, Xizhe Zang, Hua Chen, Jie Zhao

    Abstract: Running up stairs is effortless for humans but remains extremely challenging for humanoid robots due to the simultaneous requirements of high agility and strict stability. Model-free reinforcement learning (RL) can generate dynamic locomotion, yet implicit stability rewards and heavy reliance on task-specific reward shaping tend to result in unsafe behaviors, especially on stairs; conversely, mode… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  32. arXiv:2601.09377  [pdf, ps, other

    cs.RO

    ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving

    Authors: Xuemei Yao, Xiao Yang, Jianbin Sun, Liuwei Xie, Xuebin Shao, Xiyu Fang, Hang Su, Kewei Yang

    Abstract: Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge, particularly for high-lateral-acceleration maneuvers such as sharp turns, which represent critical safety situations. Existing trajectory planners exhibit systematic failures in these scenarios due to data imbalance. This results in insufficient modelling of vehicle dynamics, r… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Comments: Accepted by AAAI 2026

  33. arXiv:2601.05039  [pdf, ps, other

    cs.MA

    FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting

    Authors: Xiangyu Li, Xuan Yao, Guohao Qi, Fengbin Zhu, Kelvin J. L. Koa, Xiang Yao Ng, Ziyang Liu, Xingyu Ni, Chang Liu, Yonghui Yang, Yang Zhang, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Xiaofen Xing, Xiangmin Xu, Tat-Seng Chua, Ke-Wei Huang

    Abstract: Deep Research (DR) Agents powered by advanced Large Language Models (LLMs) have fundamentally shifted the paradigm for completing complex research tasks. Yet, a comprehensive and live evaluation of their forecasting performance on real-world, research-oriented tasks in high-stakes domains (e.g., finance) remains underexplored. We introduce FinDeepForecast, the first live, end-to-end multi-agent sy… ▽ More

    Submitted 8 January, 2026; originally announced January 2026.

  34. arXiv:2601.02907  [pdf, ps, other

    cs.CL

    Beyond the Black Box: A Survey on the Theory and Mechanism of Large Language Models

    Authors: Zeyu Gan, Ruifeng Ren, Wei Yao, Xiaolin Hu, Gengze Xu, Chen Qian, Huayi Tang, Zixuan Gong, Xinhao Yao, Pengwei Tang, Zhenxing Dou, Yong Liu

    Abstract: The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox persists within the current field: despite the empirical efficacy, our theoretical understanding of LLMs remains disproportionately nascent, forcing these systems t… ▽ More

    Submitted 12 March, 2026; v1 submitted 6 January, 2026; originally announced January 2026.

  35. arXiv:2601.02212  [pdf, ps, other

    cs.CV

    Prior-Guided DETR for Ultrasound Nodule Detection

    Authors: Jingjing Wang, Zhuo Xiao, Xinning Yao, Bo Liu, Lijuan Niu, Xiangzhi Bai, Fugen Zhou

    Abstract: Accurate detection of ultrasound nodules is essential for the early diagnosis and treatment of thyroid and breast cancers. However, this task remains challenging due to irregular nodule shapes, indistinct boundaries, substantial scale variations, and the presence of speckle noise that degrades structural visibility. To address these challenges, we propose a prior-guided DETR framework specifically… ▽ More

    Submitted 5 January, 2026; originally announced January 2026.

  36. arXiv:2601.01908  [pdf

    cs.CV cs.AI

    Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection

    Authors: Jingjing Wang, Qianglin Liu, Zhuo Xiao, Xinning Yao, Bo Liu, Lu Li, Lijuan Niu, Fugen Zhou

    Abstract: Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic accuracy is often limited by challenges such as low image contrast and blurred nodule boundaries. To address these issues, we propose Nodule-DETR, a novel detection transformer (DETR) architecture designed for… ▽ More

    Submitted 5 January, 2026; originally announced January 2026.

  37. arXiv:2512.21257  [pdf, ps, other

    cs.IR cs.CL

    ReaSeq: Unleashing World Knowledge via Reasoning for Sequential Modeling

    Authors: Jiakai Tang, Chuan Wang, Gaoming Yang, Han Wu, Jiahao Yu, Jian Wu, Jianwu Hu, Junjun Zheng, Longbin Li, Shuwen Xiao, Xiangheng Kong, Yeqiu Yang, Yuning Jiang, Ahjol Nurlanbek, Binbin Cao, Bo Zheng, Fangmei Zhu, Gaoming Zhou, Huimin Yi, Huiping Chu, Jin Huang, Jinzhe Shan, Kenan Cui, Longbin Li, Silu Zhou , et al. (10 additional authors not shown)

    Abstract: Industrial recommender systems face two fundamental limitations under the log-driven paradigm: (1) knowledge poverty in ID-based item representations that causes brittle interest modeling under data sparsity, and (2) systemic blindness to beyond-log user interests that constrains model performance within platform boundaries. These limitations stem from an over-reliance on shallow interaction stati… ▽ More

    Submitted 29 December, 2025; v1 submitted 24 December, 2025; originally announced December 2025.

  38. arXiv:2512.21065  [pdf, ps, other

    cs.RO cs.CV

    Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation

    Authors: Zebin Jiang, Tianle Jin, Xiangtong Yao, Alois Knoll, Hu Cao

    Abstract: Grasping is one of the most fundamental challenging capabilities in robotic manipulation, especially in unstructured, cluttered, and semantically diverse environments. Recent researches have increasingly explored language-guided manipulation, where robots not only perceive the scene but also interpret task-relevant natural language instructions. However, existing language-conditioned grasping meth… ▽ More

    Submitted 24 December, 2025; originally announced December 2025.

    Comments: Submitted to IEEE Journal

  39. arXiv:2512.21019  [pdf, ps, other

    cs.CV

    Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face

    Authors: Rui-qing Sun, Xingshan Yao, Tian Lan, Jia-Ling Shi, Chen-Hao Cui, Hui-Yang Zhao, Zhijing Wu, Chen Yang, Xian-Ling Mao

    Abstract: State-of-the-art 3D-field video-referenced Talking Face Generation (TFG) methods synthesize high-fidelity personalized talking-face videos in real time by modeling 3D geometry and appearance from reference portrait video. This capability raises significant privacy concerns regarding malicious misuse of personal portraits. However, no efficient defense framework exists to protect such videos agains… ▽ More

    Submitted 6 January, 2026; v1 submitted 24 December, 2025; originally announced December 2025.

  40. arXiv:2512.16977  [pdf, ps, other

    cs.CV

    Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video

    Authors: Hao Li, Daiwei Lu, Xing Yao, Nicholas Kavoussi, Ipek Oguz

    Abstract: In this paper, we present Endo-SemiS, a semi-supervised segmentation framework for providing reliable segmentation of endoscopic video frames with limited annotation. EndoSemiS uses 4 strategies to improve performance by effectively utilizing all available data, particularly unlabeled data: (1) Cross-supervision between two individual networks that supervise each other; (2) Uncertainty-guided pseu… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  41. Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams

    Authors: Yiming Cui, Xin Yao, Yuxuan Qin, Xin Li, Shijin Wang, Guoping Hu

    Abstract: Multimodal scientific reasoning remains a significant challenge for large language models (LLMs), particularly in chemistry, where problem-solving relies on symbolic diagrams, molecular structures, and structured visual data. Here, we systematically evaluate 40 proprietary and open-source multimodal LLMs, including GPT-5, o3, Gemini-2.5-Pro, and Qwen2.5-VL, on a curated benchmark of Olympiad-style… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Published at Communications Chemistry

    Journal ref: Commun. Chem. 8 (2025)

  42. arXiv:2512.13074  [pdf, ps, other

    cs.IR cs.AI

    A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval

    Authors: Huimu Wang, Yiming Qiu, Xingzhi Yao, Zhiguo Chen, Guoyu Tang, Songlin Wang, Sulong Xu, Mingming Li

    Abstract: Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy. Its core relies on a coarse-to-fine hierarchical architecture that enables rapid candidate selection and precise semantic matching, achieving millisecond-level response over billion-scale corpora. This capability makes it essential not only in tradition… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  43. arXiv:2512.02033  [pdf, ps, other

    q-bio.BM cs.AI

    CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design

    Authors: Zijun Gao, Mutian He, Shijia Sun, Hanqun Cao, Jingjie Zhang, Zihao Luo, Xiaorui Wang, Xiaojun Yao, Chang-Yu Hsieh, Chunbin Gu, Pheng Ann Heng

    Abstract: Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frus… ▽ More

    Submitted 19 November, 2025; originally announced December 2025.

  44. arXiv:2511.18537  [pdf, ps, other

    cs.CV

    Zero-Shot Video Deraining with Video Diffusion Models

    Authors: Tuomas Varanka, Juan Luis Gonzalez, Hyeongwoo Kim, Pablo Garrido, Xu Yao

    Abstract: Existing video deraining methods are often trained on paired datasets, either synthetic, which limits their ability to generalize to real-world rain, or captured by static cameras, which restricts their effectiveness in dynamic scenes with background and camera motion. Furthermore, recent works in fine-tuning diffusion models have shown promising results, but the fine-tuning tends to weaken the ge… ▽ More

    Submitted 1 February, 2026; v1 submitted 23 November, 2025; originally announced November 2025.

    Comments: WACV 2026

  45. arXiv:2511.16665  [pdf, ps, other

    cs.LG cs.AI cs.DC

    Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

    Authors: Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han

    Abstract: The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: response generation during RL training exhibits a persistent long-tail distribution, where a few very lon… ▽ More

    Submitted 20 March, 2026; v1 submitted 20 November, 2025; originally announced November 2025.

  46. arXiv:2511.12396  [pdf, ps, other

    eess.IV cs.CV

    DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

    Authors: Jiacheng Wang, Hao Li, Xing Yao, Ahmad Toubasi, Taegan Vinarsky, Caroline Gheen, Joy Derwenskus, Chaoyang Jin, Richard Dortch, Junzhong Xu, Francesca Bagnato, Ipek Oguz

    Abstract: Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our a… ▽ More

    Submitted 25 November, 2025; v1 submitted 15 November, 2025; originally announced November 2025.

  47. arXiv:2511.07940  [pdf

    cs.CV

    Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation?

    Authors: Rui-Qing Sun, Ang Li, Zhijing Wu, Tian Lan, Qianyu Lu, Xingshan Yao, Chen Xu, Xian-Ling Mao

    Abstract: Talking Face Generation (TFG) aims to produce realistic and dynamic talking portraits, with broad applications in fields such as digital education, film and television production, e-commerce live streaming, and other related areas. Currently, TFG methods based on Neural Radiated Field (NeRF) or 3D Gaussian sputtering (3DGS) are received widespread attention. They learn and store personalized featu… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  48. arXiv:2511.06824  [pdf

    cs.DC cs.CE

    A GPU-boosted high-performance multi-working condition joint analysis framework for predicting dynamics of textured axial piston pump

    Authors: Xin Yao, Yang Liu, Jin Jiang, Yesen Chen, Zhilong Chen, Hongkang Dong, Xiaofeng Wei, Teng Zhang, Dongyun Wang

    Abstract: Accurate simulation to dynamics of axial piston pump (APP) is essential for its design, manufacture and maintenance. However, limited by computation capacity of CPU device and traditional solvers, conventional iteration methods are inefficient in complicated case with textured surface requiring refined mesh, and could not handle simulation during multiple periods. To accelerate Picard iteration fo… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  49. arXiv:2511.05731  [pdf, ps, other

    cs.CV

    Towards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective

    Authors: Xing Yao, Ahana Gangopadhyay, Hsi-Ming Chang, Ravi Soni

    Abstract: Ultrasound (US) video segmentation remains a challenging problem due to strong inter- and intra-dataset variability, motion artifacts, and limited annotated data. Although foundation models such as Segment Anything Model 2 (SAM2) demonstrate strong zero-shot and prompt-guided segmentation capabilities, their performance deteriorates substantially when transferred to medical imaging domains. Curren… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

  50. arXiv:2511.05245  [pdf, ps, other

    cs.CV

    ADPretrain: Advancing Industrial Anomaly Detection via Anomaly Representation Pretraining

    Authors: Xincheng Yao, Yan Luo, Zefeng Qian, Chongyang Zhang

    Abstract: The current mainstream and state-of-the-art anomaly detection (AD) methods are substantially established on pretrained feature networks yielded by ImageNet pretraining. However, regardless of supervised or self-supervised pretraining, the pretraining process on ImageNet does not match the goal of anomaly detection (i.e., pretraining in natural images doesn't aim to distinguish between normal and a… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

    Comments: Accepted by NeurIPS 2025