Skip to main content

Showing 1–50 of 2,041 results for author: Ma, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.21329  [pdf, ps, other

    cs.CL

    Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks

    Authors: Xinhe Wang, Jin Huang, Xingjian Zhang, Tianhao Wang, Jiaqi W. Ma

    Abstract: Reasoning benchmarks such as the Abstraction and Reasoning Corpus (ARC) and ARC-AGI are widely used to assess progress in artificial intelligence and are often interpreted as probes of core, so-called ``fluid'' reasoning abilities. Despite their apparent simplicity for humans, these tasks remain challenging for frontier vision-language models (VLMs), a gap commonly attributed to deficiencies in ma… ▽ More

    Submitted 24 December, 2025; originally announced December 2025.

  2. arXiv:2512.21302  [pdf, ps, other

    cs.CV

    AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents

    Authors: Yue Cao, Yingyao Wang, Pi Bu, Jingxuan Xing, Wei Jiang, Zekun Zhu, Junpeng Ma, Sashuai Zhou, Tong Lu, Jun Song, Yu Cheng, Yuning Jiang, Bo Zheng

    Abstract: Graphical user interface (GUI) agents can substantially improve productivity by automating frequently executed long-latency tasks on mobile devices. However, existing evaluation benchmarks are still constrained to limited applications, simple tasks, and coarse-grained metrics. To address this, we introduce AndroidLens, a challenging evaluation framework for mobile GUI agents, comprising 571 long-l… ▽ More

    Submitted 24 December, 2025; originally announced December 2025.

    Comments: 23 pages, 13 figures, 8 tables

  3. arXiv:2512.20632  [pdf

    cs.AI

    Erkang-Diagnosis-1.1 Technical Report

    Authors: Jianbing Ma, Ao Feng, Zhenjie Gao, Xinyu Song, Li Su, Bin Chen, Wei Wang, Jiamin Wu

    Abstract: This report provides a detailed introduction to Erkang-Diagnosis-1.1 model, our AI healthcare consulting assistant developed using Alibaba Qwen-3 model. The Erkang model integrates approximately 500GB of high-quality structured medical knowledge, employing a hybrid approach combining enhanced pre-training and retrieval-enhanced generation to create a secure, reliable, and professional AI health ad… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 9 pages; 4 figures

  4. arXiv:2512.20061  [pdf, ps, other

    cs.AI

    Scaling Reinforcement Learning for Content Moderation with Large Language Models

    Authors: Hamed Firooz, Rui Liu, Yuchen Lu, Zhenyu Hou, Fangzhou Xiong, Xiaoyang Zhang, Changshu Jian, Zhicheng Zhu, Jiayuan Ma, Jacob Tao, Chaitali Gupta, Xiaochang Peng, Shike Mei, Hang Cui, Yang Qin, Shuo Tang, Jason Gaedtke, Arpit Mittal

    Abstract: Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for policy violations. Although recent advances in large language models (LLMs) have demonstrated strong potential for policy-grounded moderation, the practical challenges of training these systems to achieve expert-… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  5. arXiv:2512.19458  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci

    An Agentic Framework for Autonomous Materials Computation

    Authors: Zeyu Xia, Jinzhe Ma, Congjie Zheng, Shufei Zhang, Yuqiang Li, Hang Su, P. Hu, Changshui Zhang, Xingao Gong, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Mao Su

    Abstract: Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific workflows. Here, we present a domain-specialized agent designed for reliable automati… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  6. arXiv:2512.19334  [pdf, ps, other

    cs.IT cs.LG math.ST

    Orthogonal Approximate Message Passing with Optimal Spectral Initializations for Rectangular Spiked Matrix Models

    Authors: Haohua Chen, Songbin Liu, Junjie Ma

    Abstract: We propose an orthogonal approximate message passing (OAMP) algorithm for signal estimation in the rectangular spiked matrix model with general rotationally invariant (RI) noise. We establish a rigorous state evolution that precisely characterizes the algorithm's high-dimensional dynamics and enables the construction of iteration-wise optimal denoisers. Within this framework, we accommodate spectr… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  7. arXiv:2512.18747  [pdf, ps, other

    cs.CV cs.AI

    IPCV: Information-Preserving Compression for MLLM Visual Encoders

    Authors: Yuan Chen, Zichen Wen, Yuzhou Wu, Xuyang Liu, Shuang Chen, Junpeng Ma, Weijia Li, Conghui He, Linfeng Zhang

    Abstract: Multimodal Large Language Models (MLLMs) deliver strong vision-language performance but at high computational cost, driven by numerous visual tokens processed by the Vision Transformer (ViT) encoder. Existing token pruning strategies are inadequate: LLM-stage token pruning overlooks the ViT's overhead, while conventional ViT token pruning, without language guidance, risks discarding textually crit… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: 13 pages, 6 figures

  8. arXiv:2512.18352  [pdf, ps, other

    cs.CL cs.AI

    LLM-based Few-Shot Early Rumor Detection with Imitation Agent

    Authors: Fengzhu Zeng, Qian Shao, Ling Cheng, Wei Gao, Shih-Fen Cheng, Jing Ma, Cheng Niu

    Abstract: Early Rumor Detection (EARD) aims to identify the earliest point at which a claim can be accurately classified based on a sequence of social media posts. This is especially challenging in data-scarce settings. While Large Language Models (LLMs) perform well in few-shot NLP tasks, they are not well-suited for time-series data and are computationally expensive for both training and inference. In thi… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  9. arXiv:2512.18168  [pdf, ps, other

    stat.ME cs.IT math.PR math.ST

    Copula Entropy: Theory and Applications

    Authors: Jian Ma

    Abstract: This is the monograph on the theory and applications of copula entropy (CE). This book first introduces the theory of CE, including its background, definition, theorems, properties, and estimation methods. The theoretical applications of CE to structure learning, association discovery, variable selection, causal discovery, system identification, time lag estimation, domain adaptation, multivariate… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

  10. arXiv:2512.17785  [pdf

    cs.CE

    A Parametric Framework for Anticipatory Flashflood Warning: Integrating Landscape Vulnerability with Precipitation Forecasts

    Authors: Xiangpeng Li, Junwei Ma, Samuel D Brody, Ali Mostafavi

    Abstract: Flash flood warnings are largely reactive, providing limited advance notice for evacuation planning and resource prepositioning. This study presents and validates an anticipatory, parametric framework that converts landscape vulnerability and precipitation into transparent, zone-aware threat levels at neighborhood scales. We first derive an inherent hazard likelihood (IHL) surface using pluvial fl… ▽ More

    Submitted 22 December, 2025; v1 submitted 19 December, 2025; originally announced December 2025.

  11. arXiv:2512.17733  [pdf, ps, other

    cs.IR cs.AI

    Diversity Recommendation via Causal Deconfounding of Co-purchase Relations and Counterfactual Exposure

    Authors: Jingmao Zhang, Zhiting Zhao, Yunqi Lin, Jianghong Ma, Tianjun Wei, Haijun Zhang, Xiaofeng Zhang

    Abstract: Beyond user-item modeling, item-to-item relationships are increasingly used to enhance recommendation. However, common methods largely rely on co-occurrence, making them prone to item popularity bias and user attributes, which degrades embedding quality and performance. Meanwhile, although diversity is acknowledged as a key aspect of recommendation quality, existing research offers limited attenti… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

  12. arXiv:2512.17726  [pdf, ps, other

    cs.CV

    MambaMIL+: Modeling Long-Term Contextual Patterns for Gigapixel Whole Slide Image

    Authors: Qian Zeng, Yihui Wang, Shu Yang, Yingxue Xu, Fengtao Zhou, Jiabo Ma, Dejia Cai, Zhengyu Zhang, Lijuan Qu, Yu Wang, Li Liang, Hao Chen

    Abstract: Whole-slide images (WSIs) are an important data modality in computational pathology, yet their gigapixel resolution and lack of fine-grained annotations challenge conventional deep learning models. Multiple instance learning (MIL) offers a solution by treating each WSI as a bag of patch-level instances, but effectively modeling ultra-long sequences with rich spatial context remains difficult. Rece… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: 18 pages, 11 figures, 10 tables

  13. arXiv:2512.17495  [pdf, ps, other

    cs.CV

    GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

    Authors: Rang Li, Lei Li, Shuhuai Ren, Hao Tian, Shuhao Gu, Shicheng Li, Zihao Yue, Yudong Wang, Wenhan Ma, Zhe Yang, Jingyuan Ma, Zhifang Sui, Fuli Luo

    Abstract: Visual grounding, localizing objects from natural language descriptions, represents a critical bridge between language and vision understanding. While multimodal large language models (MLLMs) achieve impressive scores on existing benchmarks, a fundamental question remains: can MLLMs truly ground language in vision with human-like sophistication, or are they merely pattern-matching on simplified da… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

  14. arXiv:2512.16969  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

    Authors: Wanghan Xu, Yuhao Zhou, Yifan Zhou, Qinglong Cao, Shuo Li, Jia Bu, Bo Liu, Yixin Chen, Xuming He, Xiangyu Zhao, Xiang Zhuang, Fengxiang Wang, Zhiwang Zhou, Qiantai Feng, Wenxuan Huang, Jiaqi Wei, Hao Wu, Yuejin Yang, Guangshuai Wang, Sheng Xu, Ziyan Huang, Xinyao Liu, Jiyao Liu, Cheng Tang, Wei Li , et al. (82 additional authors not shown)

    Abstract: Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep res… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  15. arXiv:2512.16760  [pdf, ps, other

    cs.RO

    Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

    Authors: Tianshuai Hu, Xiaolu Liu, Song Wang, Yiyao Zhu, Ao Liang, Lingdong Kong, Guoyang Zhao, Zeying Gong, Jun Cen, Zhiyu Huang, Xiaoshuai Hao, Linfeng Li, Hang Song, Xiangtai Li, Jun Ma, Shaojie Shen, Jianke Zhu, Dacheng Tao, Ziwei Liu, Junwei Liang

    Abstract: Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: Preprint; 40 pages, 7 figures, 9 tables; GitHub at https://github.com/worldbench/awesome-vla-for-ad

  16. arXiv:2512.16228  [pdf

    cs.CY

    Quantifying Functional Criticality of Lifelines Through Mobility-Derived Population-Facility Dependence for Human-Centered Resilience

    Authors: Junwei Ma, Bo Li, Xiangpeng Li, Chenyue Liu, Ali Mostafavi

    Abstract: Lifeline infrastructure underpins the continuity of daily life, yet conventional criticality assessments remain largely asset-centric, inferring importance from physical capacity or network topology rather than actual behavioral reliance. This disconnect frequently obscures the true societal cost of disruption, particularly in underserved communities where residents lack service alternatives. This… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  17. arXiv:2512.15793  [pdf, ps, other

    cs.CY cs.AI cs.CL

    Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

    Authors: Yuxi Sun, Wei Gao, Hongzhan Lin, Jing Ma, Wenxuan Zhang

    Abstract: Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are social norms that inform our conduct, such as ``\textit{It is expected to be brave to report crimes}''. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Acceppt by Asia-Pacific Chapter of the Association for Computational Linguistics (2025)

  18. arXiv:2512.15405  [pdf, ps, other

    cs.LG

    EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning

    Authors: Jianfei Ma, Wee Sun Lee

    Abstract: At the boundary between the known and the unknown, an agent inevitably confronts the dilemma of whether to explore or to exploit. Epistemic uncertainty reflects such boundaries, representing systematic uncertainty due to limited knowledge. In this paper, we propose a Bayesian reinforcement learning (RL) algorithm, $\texttt{EUBRL}$, which leverages epistemic guidance to achieve principled explorati… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  19. arXiv:2512.14429  [pdf, ps, other

    cs.AI cs.SE

    Seismology modeling agent: A smart assistant for geophysical researchers

    Authors: Yukun Ren, Siwei Yu, Kai Chen, Jianwei Ma

    Abstract: To address the steep learning curve and reliance on complex manual file editing and command-line operations in the traditional workflow of the mainstream open-source seismic wave simulation software SPECFEM, this paper proposes an intelligent, interactive workflow powered by Large Language Models (LLMs). We introduce the first Model Context Protocol (MCP) server suite for SPECFEM (supporting 2D, 3… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: 26 pages, 15 figures. Code available at https://github.com/RenYukun1563/specfem-mcp

  20. arXiv:2512.14098  [pdf, ps, other

    cs.LG cs.DC

    Cornserve: Efficiently Serving Any-to-Any Multimodal Models

    Authors: Jeff J. Ma, Jae-Won Chung, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury

    Abstract: We present Cornserve, an efficient online serving system for an emerging class of multimodal models called Any-to-Any models. Any-to-Any models accept combinations of text and multimodal data (e.g., image, video, audio) as input and also generate combinations of text and multimodal data as output, introducing request type, computation path, and computation scaling heterogeneity in model serving.… ▽ More

    Submitted 18 December, 2025; v1 submitted 16 December, 2025; originally announced December 2025.

    Comments: Open-source at https://github.com/cornserve-ai/cornserve

  21. arXiv:2512.14069  [pdf, ps, other

    cs.AI

    RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees

    Authors: Junjie Ma, Jinlong Li

    Abstract: Inference with modern Large Language Models (LLMs) is expensive and slow, and speculative sampling has emerged as an effective solution to this problem, however, the number of the calls to the draft model for generating candidate tokens in speculative sampling is a preset hyperparameter, lacking flexibility. To generate and utilize the candidate tokens more effectively, we propose RADAR, a novel s… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: 5 pages, 2 figures

  22. Citation importance-aware document representation learning for large-scale science mapping

    Authors: Zhentao Liang, Nees Jan van Eck, Xuehua Wu, Jin Mao, Gang Li

    Abstract: Effective science mapping relies on high-quality representations of scientific documents. As an important task in scientometrics and information studies, science mapping is often challenged by the complex and heterogeneous nature of citations. While previous studies have attempted to improve document representations by integrating citation and semantic information, the heterogeneity of citations i… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  23. arXiv:2512.12070  [pdf, ps, other

    cs.CR

    Towards Channel-Robust and Receiver-Independent Radio Frequency Fingerprint Identification

    Authors: Jie Ma, Junqing Zhang, Guanxiong Shen, Linning Peng, Alan Marshall

    Abstract: Radio frequency fingerprint identification (RFFI) is an emerging method for authenticating Internet of Things (IoT) devices. RFFI exploits the intrinsic and unique hardware imperfections for classifying IoT devices. Deep learning-based RFFI has shown excellent performance. However, there are still remaining research challenges, such as limited public training datasets as well as impacts of channel… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  24. arXiv:2512.12002  [pdf, ps, other

    cs.CR cs.LG

    Adversarial Attacks Against Deep Learning-Based Radio Frequency Fingerprint Identification

    Authors: Jie Ma, Junqing Zhang, Guanxiong Shen, Alan Marshall, Chip-Hong Chang

    Abstract: Radio frequency fingerprint identification (RFFI) is an emerging technique for the lightweight authentication of wireless Internet of things (IoT) devices. RFFI exploits deep learning models to extract hardware impairments to uniquely identify wireless devices. Recent studies show deep learning-based RFFI is vulnerable to adversarial attacks. However, effective adversarial attacks against differen… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  25. arXiv:2512.11811  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.CY

    Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention

    Authors: Fengyi Xu, Jun Ma, Waishan Qiu, Cui Guo, Jack C. P. Cheng

    Abstract: Crowdsourced street-view imagery from social media provides real-time visual evidence of urban flooding and other crisis events, yet it often lacks reliable geographic metadata for emergency response. Existing image geo-localization approaches, also known as Visual Place Recognition (VPR) models, exhibit substantial performance degradation when applied to such imagery due to visual distortions and… ▽ More

    Submitted 16 December, 2025; v1 submitted 24 November, 2025; originally announced December 2025.

    Comments: Updated author list to include additional contributor. Revised title and improved methodology section based on collaborative feedback

  26. arXiv:2512.10948  [pdf, ps, other

    cs.CV

    ClusIR: Towards Cluster-Guided All-in-One Image Restoration

    Authors: Shengkai Hu, Jiaqi Ma, Jun Wan, Wenwen Min, Yongcheng Jing, Lefei Zhang, Dacheng Tao

    Abstract: All-in-One Image Restoration (AiOIR) aims to recover high-quality images from diverse degradations within a unified framework. However, existing methods often fail to explicitly model degradation types and struggle to adapt their restoration behavior to complex or mixed degradations. To address these issues, we propose ClusIR, a Cluster-Guided Image Restoration framework that explicitly models deg… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  27. arXiv:2512.09927  [pdf, ps, other

    cs.RO

    Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models

    Authors: Yifan Ye, Jiaqi Ma, Jun Cen, Zhihe Lu

    Abstract: Vision-Language-Action (VLA) models pretrained on large-scale multimodal datasets have emerged as powerful foundations for robotic perception and control. However, their massive scale, often billions of parameters, poses significant challenges for real-time deployment, as inference becomes computationally expensive and latency-sensitive in dynamic environments. To address this, we propose Token Ex… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

    Comments: 8 pages, 5 figures

  28. arXiv:2512.07899  [pdf, ps, other

    cs.SI math.AP math.CO

    Finding core subgraphs of directed graphs via discrete Ricci curvature flow

    Authors: Juan Zhao, Jicheng Ma, Yunyan Yang, Liang Zhao

    Abstract: Ricci curvature and its associated flow offer powerful geometric methods for analyzing complex networks. While existing research heavily focuses on applications for undirected graphs such as community detection and core extraction, there have been relatively less attention on directed graphs. In this paper, we introduce a definition of Ricci curvature and an accompanying curvature flow for direc… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

    Comments: 21 pages

    MSC Class: 05C21; 35R02; 68Q06

  29. arXiv:2512.07828  [pdf, ps, other

    cs.LG econ.GN

    The Adoption and Usage of AI Agents: Early Evidence from Perplexity

    Authors: Jeremy Yang, Noah Yonack, Kate Zyskowski, Denis Yarats, Johnny Ho, Jerry Ma

    Abstract: This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawing on hundreds of millions of anonymized user interactions, we address three fundamental questions: W… ▽ More

    Submitted 10 December, 2025; v1 submitted 8 December, 2025; originally announced December 2025.

  30. arXiv:2512.07170  [pdf, ps, other

    cs.CV cs.AI

    Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach

    Authors: Jiayang Li, Chengjie Jiang, Junjun Jiang, Pengwei Liang, Jiayi Ma, Liqiang Nie

    Abstract: Image fusion aims to blend complementary information from multiple sensing modalities, yet existing approaches remain limited in robustness, adaptability, and controllability. Most current fusion networks are tailored to specific tasks and lack the ability to flexibly incorporate user intent, especially in complex scenarios involving low-light degradation, color shifts, or exposure imbalance. More… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  31. arXiv:2512.06500  [pdf, ps, other

    cs.CR

    PDRIMA: A Policy-Driven Runtime Integrity Measurement and Attestation Approach for ARM TrustZone-based TEE

    Authors: Jingkai Mao, Xiaolin Chang

    Abstract: Trusted Execution Environments (TEEs) such as ARM TrustZone are widely used in IoT and embedded devices to protect sensitive code and data. However, most existing defenses focus on secure boot or REE-side monitoring and provide little visibility into the runtime integrity of the TEE. This leaves TrustZone-based devices exposed to persistent TEE compromises. We propose Policy-Driven Runtime Integri… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

  32. arXiv:2512.06227  [pdf, ps, other

    cs.CL cs.LG

    Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety

    Authors: Junyu Mao, Anthony Hills, Talia Tseriotou, Maria Liakata, Aya Shamir, Dan Sayda, Dana Atzil-Slonim, Natalie Djohari, Arpan Mandal, Silke Roth, Pamela Ugwudike, Mahesan Niranjan, Stuart E. Middleton

    Abstract: Real-world indicators are important for improving natural language processing (NLP) tasks such as life events for mental health analysis and risky behaviour for online safety, yet labelling such information in NLP training datasets is often costly and/or difficult given the dynamic nature of such events. This paper compares several LLM-based data enrichment methods and introduces a novel Confidenc… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  33. arXiv:2512.05955  [pdf, ps, other

    cs.RO cs.CV

    SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

    Authors: Haowen Liu, Shaoxiong Yao, Haonan Chen, Jiawei Gao, Jiayuan Mao, Jia-Bin Huang, Yilun Du

    Abstract: Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic m… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  34. arXiv:2512.05104  [pdf, ps, other

    cs.CV

    EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation

    Authors: Jiaqi Ma, Shengkai Hu, Xu Zhang, Jun Wan, Jiaxing Huang, Lefei Zhang, Salman Khan

    Abstract: All-in-One Image Restoration (AiOIR) tasks often involve diverse degradation that require robust and versatile strategies. However, most existing approaches typically lack explicit frequency modeling and rely on fixed or heuristic optimization schedules, which limit the generalization across heterogeneous degradation. To address these limitations, we propose EvoIR, an AiOIR-specific framework that… ▽ More

    Submitted 11 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  35. arXiv:2512.04904  [pdf, ps, other

    cs.CV cs.AI

    ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

    Authors: Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun Huang

    Abstract: Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure bias in Flow Matching, including: (1) the model lacks generalization to biased inputs during training, and (2) insufficient low-frequency content captured during early denoising, leading to accumulated bias. Ba… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  36. arXiv:2512.04448  [pdf, ps, other

    cs.DL cs.CY

    Has ACL Lost Its Crown? A Decade-Long Quantitative Analysis of Scale and Impact Across Leading AI Conferences

    Authors: Jianglin Ma, Ben Yao, Xiang Li, Yazhou Zhang

    Abstract: The recent surge of language models (LMs) has rapidly expanded NLP/AI research, driving an exponential rise in submissions and acceptances at major conferences. Yet this growth has been shadowed by escalating concerns over conference quality, such as plagiarism, reviewer inexperience, and collusive bidding. However, existing studies rely largely on qualitative accounts, for example expert intervie… ▽ More

    Submitted 24 December, 2025; v1 submitted 3 December, 2025; originally announced December 2025.

  37. arXiv:2512.02719  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG q-bio.NC

    Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

    Authors: Julian Ma, Jun Wang, Zafeirios Fountas

    Abstract: Large language models (LLMs) excel at explicit reasoning, but their implicit computational strategies remain underexplored. Decades of psychophysics research show that humans intuitively process and integrate noisy signals using near-optimal Bayesian strategies in perceptual tasks. We ask whether LLMs exhibit similar behaviour and perform optimal multimodal integration without explicit training or… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  38. arXiv:2512.02665  [pdf, ps, other

    cs.CL

    Input Order Shapes LLM Semantic Alignment in Multi-Document Summarization

    Authors: Jing Ma

    Abstract: Large language models (LLMs) are now used in settings such as Google's AI Overviews, where it summarizes multiple long documents. However, it remains unclear whether they weight all inputs equally. Focusing on abortion-related news, we construct 40 pro-neutral-con article triplets, permute each triplet into six input orders, and prompt Gemini 2.5 Flash to generate a neutral overview. We evaluate e… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    Comments: 9 pages, 3 figures, 2 tables

  39. arXiv:2512.02038  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Deep Research: A Systematic Survey

    Authors: Zhengliang Shi, Yiqun Chen, Haitao Li, Weiwei Sun, Shiyu Ni, Yougang Lyu, Run-Ze Fan, Bowen Jin, Yixuan Weng, Minjun Zhu, Qiujie Xie, Xinyu Guo, Qu Yang, Jiayi Wu, Jujia Zhao, Xiaqiang Tang, Xinbei Ma, Cunxiang Wang, Jiaxin Mao, Qingyao Ai, Jen-Tse Huang, Wenxuan Wang, Yue Zhang, Yiming Yang, Zhaopeng Tu , et al. (1 additional authors not shown)

    Abstract: Large language models (LLMs) have rapidly evolved from text generators into powerful problem solvers. Yet, many open tasks demand critical thinking, multi-source, and verifiable outputs, which are beyond single-shot prompting or standard retrieval-augmented generation. Recently, numerous studies have explored Deep Research (DR), which aims to combine the reasoning capabilities of LLMs with externa… ▽ More

    Submitted 24 November, 2025; originally announced December 2025.

  40. arXiv:2512.01958  [pdf, ps, other

    cs.AI

    Learned-Rule-Augmented Large Language Model Evaluators

    Authors: Jie Meng, Jin Mao

    Abstract: Large language models (LLMs) are predominantly used as evaluators for natural language generation (NLG) tasks, but their application to broader evaluation scenarios remains limited. In this work, we explore the potential of LLMs as general evaluators across diverse tasks. Although LLM-based evaluators have made progress in different areas, existing methods struggle to generalize due to their relia… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  41. arXiv:2512.00807  [pdf, ps, other

    cs.AI

    BioPro: On Difference-Aware Gender Fairness for Vision-Language Models

    Authors: Yujie Lin, Jiayao Ma, Qingguo Hu, Derek F. Wong, Jinsong Su

    Abstract: Vision-Language Models (VLMs) inherit significant social biases from their training data, notably in gender representation. Current fairness interventions often adopt a difference-unaware perspective that enforces uniform treatment across demographic groups. These approaches, however, fail to distinguish between contexts where neutrality is required and those where group-specific attributes are le… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  42. arXiv:2512.00762  [pdf, ps, other

    cs.CV

    Seeing the Wind from a Falling Leaf

    Authors: Zhiyuan Gao, Jiageng Mao, Hong-Xing Yu, Haozhe Lou, Emily Yue-Ting Jia, Jernej Barbic, Jiajun Wu, Yue Wang

    Abstract: A longstanding goal in computer vision is to model motions from videos, while the representations behind motions, i.e. the invisible physical interactions that cause objects to deform and move, remain largely unexplored. In this paper, we study how to recover the invisible forces from visual observations, e.g., estimating the wind field by observing a leaf falling to the ground. Our key innovation… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: Accepted at NeurIPS 2025

  43. arXiv:2511.22056  [pdf, ps, other

    cs.HC

    EAST: Environment-Aware Stylized Transition Along the Reality-Virtuality Continuum

    Authors: Xiaohan Zhang, Kan Liu, Yangle Liu, Fengze Li, Jieming Ma, Yue Li

    Abstract: In the Virtual Reality (VR) gaming industry, maintaining immersion during real-world interruptions remains a challenge, particularly during transitions along the reality-virtuality continuum (RVC). Existing methods tend to rely on digital replicas or simple visual transitions, neglecting to address the aesthetic discontinuities between real and virtual environments, especially in highly stylized V… ▽ More

    Submitted 1 December, 2025; v1 submitted 26 November, 2025; originally announced November 2025.

  44. arXiv:2511.21431  [pdf, ps, other

    cs.DC

    MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

    Authors: Lu Zhao, Rong Shi, Shaoqing Zhang, Yueqiang Chen, Baoguo He, Hongfeng Sun, Ziqing Yin, Shangchao Su, Zhiyan Cui, Liang Dong, Xiyuan Li, Lingbin Wang, Jianwei He, Jiesong Ma, Weikang Huang, Jianglei Tong, Dongdong Gao, Jian Zhang, Hong Tian

    Abstract: The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. This imbalance leads to memory overflow on GPUs with limited capacity, constraining model scalability. Existing load balancing methods, which cap expert capacity, compromise model accuracy and fail on memory-constrained hardware. To address th… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  45. arXiv:2511.21367  [pdf, ps, other

    cs.CV

    Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

    Authors: Yangle Liu, Fengze Li, Kan Liu, Jieming Ma

    Abstract: Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns with geometry and triggers early geometric drift, where erroneous shapes are reinforced during densification and become hard to correct. We ask how to anchor geometry early for 4D Gaussian splatting (4DGS) while maintaining temporal consisten… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  46. arXiv:2511.20597  [pdf, ps, other

    cs.LG cs.AI cs.CR

    BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

    Authors: Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, Ninghui Li

    Abstract: The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments remains insufficiently understood. In this work, we examine the landscape of prompt injection atta… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  47. arXiv:2511.20233  [pdf, ps, other

    cs.CL

    REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

    Authors: Chuyi Kong, Gao Wei, Jing Ma, Hongzhan Lin, Yaxin Fan

    Abstract: The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, a… ▽ More

    Submitted 28 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

  48. arXiv:2511.18926  [pdf, ps, other

    cs.AI cs.HC

    MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems

    Authors: Haifeng Jing, Yujie Hou, Junfei Liu, Rui Xie, alan Xu, Jinlong Ma, Qichun Deng

    Abstract: With the rapid development of Large Language Models, dialogue systems are shifting from information tools to emotional companions, heralding the era of Emotional Companionship Dialogue Systems (ECDs) that provide personalized emotional support for users. However, the field lacks clear definitions and systematic evaluation standards for ECDs. To address this, we first propose a definition of ECDs w… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: 26 pages, 7 figures

    ACM Class: I.2.7

  49. arXiv:2511.17604  [pdf, ps, other

    cs.LG cs.AI

    BrainHGT: A Hierarchical Graph Transformer for Interpretable Brain Network Analysis

    Authors: Jiajun Ma, Yongchao Zhang, Chao Zhang, Zhao Lv, Shengbing Pei

    Abstract: Graph Transformer shows remarkable potential in brain network analysis due to its ability to model graph structures and complex node relationships. Most existing methods typically model the brain as a flat network, ignoring its modular structure, and their attention mechanisms treat all brain region connections equally, ignoring distance-related node connection patterns. However, brain information… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  50. arXiv:2511.17496  [pdf, ps, other

    cs.RO cs.MA

    MDG: Masked Denoising Generation for Multi-Agent Behavior Modeling in Traffic Environments

    Authors: Zhiyu Huang, Zewei Zhou, Tianhui Cai, Yun Zhang, Jiaqi Ma

    Abstract: Modeling realistic and interactive multi-agent behavior is critical to autonomous driving and traffic simulation. However, existing diffusion and autoregressive approaches are limited by iterative sampling, sequential decoding, or task-specific designs, which hinder efficiency and reuse. We propose Masked Denoising Generation (MDG), a unified generative framework that reformulates multi-agent beha… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.