Skip to main content

Showing 1–50 of 6,458 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.04875  [pdf, ps, other

    cs.CV cs.AI cs.MM

    DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing

    Authors: Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng, Shaoqi Wang, Xiang Chen

    Abstract: Video mashup creation represents a complex video editing paradigm that recomposes existing footage to craft engaging audio-visual experiences, demanding intricate orchestration across semantic, visual, and auditory dimensions and multiple levels. However, existing automated editing frameworks often overlook the cross-level multimodal orchestration to achieve professional-grade fluidity, resulting… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  2. arXiv:2604.04841  [pdf, ps, other

    cs.SD eess.AS eess.SP

    Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

    Authors: Xuanjun Chen, Chia-Yu Hu, Sung-Feng Huang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Rapid advances in singing voice synthesis have increased unauthorized imitation risks, creating an urgent need for better Singing Voice Deepfake (SingFake) Detection, also known as SVDD. Unlike speech, singing contains complex pitch, wide dynamic range, and timbral variations. Conventional 16 kHz-sampled detectors prove inadequate, as they discard vital high-frequency information. This study prese… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: Submitted to INTERSPEECH 2026

  3. arXiv:2604.04707  [pdf, ps, other

    cs.CV

    OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

    Authors: DataFlow Team, Bohan Zeng, Daili Hua, Kaixin Zhu, Yifan Dai, Bozhou Li, Yuran Wang, Chengzhuo Tong, Yifan Yang, Mingkun Chang, Jianbin Zhao, Zhou Liu, Hao Liang, Xiaochen Ma, Ruichuan An, Junbo Niu, Zimo Meng, Tianyi Bai, Meiyi Qiang, Huanyao Zhang, Zhiyou Xiao, Tianyu Guo, Qinhan Yu, Runhao Zhao, Zhengpin Li , et al. (16 additional authors not shown)

    Abstract: World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world models, we propose a clear definition: a world model is a model or framework cent… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: 28 pages, 6 figures

  4. arXiv:2604.04516  [pdf, ps, other

    cs.LG cs.AI

    GAIN: Multiplicative Modulation for Domain Adaptation

    Authors: Hengshuai Yao, Xing Chen, Ahmed Murtadha, Guan Wang

    Abstract: Adapting LLMs to new domains causes forgetting because standard methods (full fine-tuning, LoRA) inject new directions into the weight space. We propose GAIN, which re-emphasizes existing features through multiplicative modulation W_new = S * W. The learned diagonal matrix S is applied to the attention output projection and optionally the FFN. The principle mirrors gain modulation in neuroscience,… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  5. arXiv:2604.04513  [pdf, ps, other

    cs.CV cs.RO

    MPTF-Net: Multi-view Pyramid Transformer Fusion Network for LiDAR-based Place Recognition

    Authors: Shuyuan Li, Zihang Wang, Xieyuanli Chen, Wenkai Zhu, Xiaoteng Fang, Peizhou Ni, Junhao Yang, Dong Kong

    Abstract: LiDAR-based place recognition (LPR) is essential for global localization and loop-closure detection in large-scale SLAM systems. Existing methods typically construct global descriptors from Range Images or BEV representations for matching. BEV is widely adopted due to its explicit 2D spatial layout encoding and efficient retrieval. However, conventional BEV representations rely on simple statistic… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  6. arXiv:2604.04330  [pdf, ps, other

    cs.ET

    Light-Bound Transformers: Hardware-Anchored Robustness for Silicon-Photonic Computer Vision Systems

    Authors: Xuming Chen, Deniz Najafi, Chengwei Zhou, Pietro Mercati, Arman Roohi, Mohsen Imani, Mahdi Nikdast, Shaahin Angizi, Gourav Datta

    Abstract: Deploying Vision Transformers (ViTs) on near-sensor analog accelerators demands training pipelines that are explicitly aligned with device-level noise and energy constraints. We introduce a compact framework for silicon-photonic execution of ViTs that integrates measured hardware noise, robust attention training, and an energy-aware processing flow. We first characterize bank-level noise in micror… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    Comments: Accepted at Design Automation Conference (DAC) 2026

  7. arXiv:2604.04247  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

    Authors: Hanchen Li, Runyuan He, Qizheng Zhang, Changxiu Ji, Qiuyang Mang, Xiaokun Chen, Lakshya A Agrawal, Wei-Liang Liao, Eric Yang, Alvin Cheung, James Zou, Kunle Olukotun, Ion Stoica, Joseph E. Gonzalez

    Abstract: Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their a… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  8. arXiv:2604.04135  [pdf, ps, other

    cs.CV

    NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

    Authors: Shuhong Liu, Chenyu Bao, Ziteng Cui, Xuangeng Chu, Bin Ren, Lin Gu, Xiang Chen, Mingrui Li, Long Ma, Marcos V. Conde, Radu Timofte, Yun Liu, Ryo Umagami, Tomohiro Hashimoto, Zijian Hu, Yuan Gan, Tianhan Xu, Yusuke Kurose, Tatsuya Harada, Junwei Yuan, Gengjia Chang, Xining Ge, Mache You, Qida Cao, Zeliang Li , et al. (81 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participa… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  9. arXiv:2604.04120  [pdf, ps, other

    cs.CL

    Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

    Authors: Lingjie Zeng, Xiaofan Chen, Yanbo Wang, Xiuying Chen

    Abstract: Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preservi… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  10. arXiv:2604.04044  [pdf, ps, other

    cs.NI

    UAV Control and Communication Enabled Low-Altitude Economy: Challenges, Resilient Architecture and Co-design Strategies

    Authors: Tianhao Liang, Nanchi Su, Yuqi Ping, Guangyu Lei, Xinglin Chen, Longyu Zhou, Tingting Zhang, Qinyu Zhang, Tony Q. S. Quek

    Abstract: The emerging low-altitude economy has catalyzed the large-scale deployment of unmanned aerial vehicles (UAVs), driving a paradigm shift in environment monitoring, logistics, and emergency response. However, operating within these environments presents notable challenges as pervasive coverage holes, unpredictable interference, and spectrum scarcity. To this end, this article present a communication… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  11. arXiv:2604.03873  [pdf, ps, other

    cs.LG cs.CL

    SODA: Semi On-Policy Black-Box Distillation for Large Language Models

    Authors: Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo

    Abstract: Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via adversarial training but introduce well-known training instability and crippling computational overhead. T… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  12. arXiv:2604.03870  [pdf, ps, other

    cs.CL

    Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

    Authors: Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang

    Abstract: The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthor… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  13. arXiv:2604.03806  [pdf, ps, other

    cs.CV

    Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement

    Authors: Xuanzhao Dong, Wenhui Zhu, Xiwen Chen, Hao Wang, Xin Li, Yujian Xiong, Jiajun Cheng, Zhipeng Wang, Shao Tang, Oana Dumitrascu, Yalin Wang

    Abstract: Over the past decade, generative models have demonstrated success in enhancing fundus images. However, the evaluation of these models remains a challenge. A benchmark for fundus image enhancement is needed for three main reasons:(1) Conventional denoising metrics such as PSNR and SSIM fail to capture clinically relevant features, such as lesion preservation and vessel morphology consistency, limit… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  14. arXiv:2604.03660  [pdf, ps, other

    cs.AI

    TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

    Authors: Xiaoyu Chen, Lu Dai, Hanqing Wang, Zhuoyu Li, Wenbin Dai, Yanzong Zheng, Zhenggang Xia, Junyong Lin, Hui Xiong

    Abstract: Structured tables are essential for conveying high-density information in professional domains such as finance, healthcare, and scientific research. Despite the progress in Multimodal Large Language Models (MLLMs), reasoning performance remains limited for complex tables with hierarchical layouts. In this paper, we identify a critical Perception Bottleneck through quantitative analysis. We find th… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  15. arXiv:2604.03657  [pdf, ps, other

    cs.CV cs.IR cs.MM

    Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning

    Authors: Tianci Luo, Haohao Pan, Jinpeng Wang, Niu Lian, Xinrui Chen, Bin Chen, Shu-Tao Xia, Chun Yuan

    Abstract: Visual in-context learning (VICL) enables visual foundation models to handle multiple tasks by steering them with demonstrative prompts. The choice of such prompts largely influences VICL performance, standing out as a key challenge. Prior work has made substantial progress on prompt retrieval and reranking strategies, but mainly focuses on prompt images while overlooking labels. We reveal these a… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

    Comments: Accepted to CVPR 2026. 10 pages, 5 figures, 3 tables

  16. arXiv:2604.03340  [pdf, ps, other

    cs.CV cs.AI

    Learning Additively Compositional Latent Actions for Embodied AI

    Authors: Hangxing Wei, Xiaoyu Chen, Chuheng Zhang, Tim Pearce, Jianyu Chen, Alex Lamb, Li Zhao, Jiang Bian

    Abstract: Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the additive, compositional structure of physical motion. As a result, latents often entangle irrelevant scene details or information about future observations with true s… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  17. arXiv:2604.03272  [pdf, ps, other

    q-fin.CP cs.AI cs.GT q-fin.GN

    Artificial Intelligence and Systemic Risk: A Unified Model of Performative Prediction, Algorithmic Herding, and Cognitive Dependency in Financial Markets

    Authors: Shuchen Meng, Xupeng Chen

    Abstract: We develop a unified model in which AI adoption in financial markets generates systemic risk through three mutually reinforcing channels: performative prediction, algorithmic herding, and cognitive dependency. Within an extended rational expectations framework with endogenous adoption, we derive an equilibrium systemic risk coupling $r(φ) = φρβ/λ'(φ)$, where $φ$ is the AI adoption share, $ρ$ the a… ▽ More

    Submitted 23 March, 2026; originally announced April 2026.

  18. arXiv:2604.03260  [pdf, ps, other

    cs.CL cs.AI

    Why Attend to Everything? Focus is the Key

    Authors: Hengshuai Yao, Xing Chen, Ahmed Murtadha, Jin Li, Shuai Shao, Yasin Abbasi Yadkori, Guan Wang, Mingli Yuan, William Chen, Sen Song

    Abstract: We introduce Focus, a method that learns which token pairs matter rather than approximating all of them. Learnable centroids assign tokens to groups; distant attention is restricted to same-group pairs while local attention operates at full resolution. Because all model weights stay frozen, Focus is purely additive: centroid-only training (as few as 148K parameters) improves domain perplexity with… ▽ More

    Submitted 12 March, 2026; originally announced April 2026.

  19. arXiv:2604.03168  [pdf, ps, other

    cs.IT

    An Algebraic Method for Full-Rank Characterization in Binary Linear Coding

    Authors: Mingyang Zhu, Laigang Guo, Zhenyu Huang, Xingbing Chen, Jue Wang, Tao Guo, Xiao-Shan Gao

    Abstract: In this paper, we develop a characteristic set (CS)-based method for deriving full-rank equivalence conditions of symbolic matrices over the binary field. Such full-rank conditions are of fundamental importance for many linear coding problems in communication and information theory. Building on the developed CS-based method, we present an algorithm called Binary Characteristic Set for Full Rank (B… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: Submitted to IEEE for possible publication

  20. arXiv:2604.03117  [pdf, ps, other

    cs.CV

    Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

    Authors: Chengyin Hu, Yuxian Dong, Yikun Guo, Xiang Chen, Junqi Wu, Jiahuan Long, Yiwei Wei, Tingsong Jiang, Wen Yao

    Abstract: Infrared vision-language models (IR-VLMs) have emerged as a promising paradigm for multimodal perception in low-visibility environments, yet their robustness to adversarial attacks remains largely unexplored. Existing adversarial patch methods are mainly designed for RGB-based models in closed-set settings and are not readily applicable to the open-ended semantic understanding and physical deploym… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  21. arXiv:2604.02781  [pdf, ps, other

    cs.SD

    DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos

    Authors: Ziyu Luo, Lin Chen, Qiang Qu, Xiaoming Chen, Yiran Shen

    Abstract: Spatial audio is crucial for immersive 360-degree video experiences, yet most 360-degree videos lack it due to the difficulty of capturing spatial audio during recording. Automatically generating spatial audio such as first-order ambisonics (FOA) from video therefore remains an important but challenging problem. In complex scenes, sound perception depends not only on sound source locations but als… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: arXiv admin note: text overlap with arXiv:2602.06846

  22. arXiv:2604.02486  [pdf, ps, other

    cs.CV cs.CL

    VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

    Authors: Haz Sameen Shahgir, Xiaofu Chen, Yu Fu, Erfan Shayegani, Nael Abu-Ghazaleh, Yova Kementchedjhieva, Yue Dong

    Abstract: Vision Language Models (VLMs) achieve impressive performance across a wide range of multimodal tasks. However, on some tasks that demand fine-grained visual perception, they often fail even when the required information is present in their internal representations. In this work, we demonstrate that this gap arises from their narrow training pipeline which focuses on moving visual information to th… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  23. arXiv:2604.02368  [pdf, ps, other

    cs.AI cs.CL

    Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

    Authors: Xue Liu, Xin Ma, Yuxin Ma, Yongchang Peng, Duo Wang, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xinyu Chen, Tianci He, Jiani Hou, Liang Hu, Ziyun Huang, Yongzhe Hui, Jianpeng Jiao, Chennan Ju, Yingru Kong, Yiran Li, Mengyun Liu, Luyao Ma, Fei Ni, Yiqing Ni, Yueyan Qiu, Yanle Ren, Zilin Shi , et al. (9 additional authors not shown)

    Abstract: As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffer from narrow domain coverage, reliance on generalist tasks, or self-evaluation biases. To bridge this gap, we present XpertBench, a high-fidelity be… ▽ More

    Submitted 6 April, 2026; v1 submitted 27 March, 2026; originally announced April 2026.

  24. arXiv:2604.02345  [pdf, ps, other

    cs.LG cs.AI

    UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

    Authors: Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, Xinyi Wang, Wei Dai, Gang Cao, Yuetang Deng, Zhi Gong, Dezhi Ran, Linyi Li, Wei Yang, Tao Xie

    Abstract: Scaling generalist GUI agents is hindered by the data scalability bottleneck of expensive human demonstrations and the "distillation ceiling" of synthetic teacher supervision. To transcend these limitations, we propose UI-Oceanus, a framework that shifts the learning focus from mimicking high-level trajectories to mastering interaction physics via ground-truth environmental feedback. Through a sys… ▽ More

    Submitted 11 February, 2026; originally announced April 2026.

  25. arXiv:2604.02235  [pdf, ps, other

    cs.DS math.PR

    Subquadratic Counting via Perfect Marginal Sampling

    Authors: Xiaoyu Chen, Zongchen Chen, Kuikui Liu, Xinyuan Zhang

    Abstract: We study the computational complexity of approximately computing the partition function of a spin system. Techniques based on standard counting-to-sampling reductions yield $\tilde{O}(n^2)$-time algorithms, where $n$ is the size of the input graph. We present new counting algorithms that break the quadratic-time barrier in a wide range of settings. For example, for the hardcore model of $λ$-weight… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  26. arXiv:2604.02060  [pdf, ps, other

    cs.CV cs.RO

    CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

    Authors: Jingliang Li, Jindou Jia, Tuo An, Chuhao Zhou, Xiangyu Chen, Shilin Shan, Boyu Ma, Bofan Lyu, Gen Li, Jianfei Yang

    Abstract: When told to "cut the apple," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: Code available at: github.com/Lorenzo-0-0/CompassAD

  27. arXiv:2604.02006  [pdf, ps, other

    cs.AI

    ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning

    Authors: Jingyue Gao, Yanjiang Guo, Xiaoshuai Chen, Jianyu Chen

    Abstract: Reinforcement Learning (RL) significantly enhances the reasoning abilities of large language models (LLMs), yet applying it to multi-turn agentic tasks remains challenging due to the long-horizon nature of interactions and the stochasticity of environmental feedback. We identify a structural failure mode in agentic exploration: suboptimal actions elicit noisy observations into misleading contexts,… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  28. Light-ResKAN: A Parameter-Sharing Lightweight KAN with Gram Polynomials for Efficient SAR Image Recognition

    Authors: Pan Yi, Weijie Li, Xiaodong Chen, Jiehua Zhang, Li Liu, Yongxiang Liu

    Abstract: Synthetic Aperture Radar (SAR) image recognition is vital for disaster monitoring, military reconnaissance, and ocean observation. However, large SAR image sizes hinder deep learning deployment on resource-constrained edge devices, and existing lightweight models struggle to balance high-precision feature extraction with low computational requirements. The emerging Kolmogorov-Arnold Network (KAN)… ▽ More

    Submitted 2 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

    Comments: 16 pages, 8 figures, accepted by JSTARS

  29. arXiv:2604.01725  [pdf

    cs.AI cs.LG

    LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis

    Authors: Zhihuan Wei, Xinhang Chen, Danyang Han, Yang Hu, Jie Liu, Xuewen Miao, Guijiang Li

    Abstract: General aviation fault diagnosis and efficient maintenance are critical to flight safety; however, deploying deep learning models on resource-constrained edge devices poses dual challenges in computational capacity and interpretability. This paper proposes LiteInception--a lightweight interpretable fault diagnosis framework designed for edge deployment. The framework adopts a two-stage cascaded ar… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  30. arXiv:2604.01644  [pdf, ps, other

    cs.CV cs.MM

    TOL: Textual Localization with OpenStreetMap

    Authors: Youqi Liao, Shuhao Kang, Jingyu Xu, Olaf Wysocki, Yan Xia, Jianping Li, Zhen Dong, Bisheng Yang, Xieyuanli Chen

    Abstract: Natural language provides an intuitive way to express spatial intent in geospatial applications. While existing localization methods often rely on dense point cloud maps or high-resolution imagery, OpenStreetMap (OSM) offers a compact and freely available map representation that encodes rich semantic and structural information, making it well suited for large-scale localization. However, text-to-O… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: Tech repo

  31. arXiv:2604.01621  [pdf, ps, other

    cs.DC cs.AI

    DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72

    Authors: Wanqian Li, Jintao Peng, Zongfei Jing, Tianyu Zhang, Ze Long, Xianjie Qiao, Xiaoming Chen, Dongxu Yang, Kefeng Duan, June Yang

    Abstract: Large language model (LLM) inference increasingly depends on multi-GPU execution, yet existing inference parallelization strategies require layer-wise inter-rank synchronization, making end-to-end performance sensitive to workload imbalance. We present DWDP (Distributed Weight Data Parallelism), an inference parallelization strategy that preserves data-parallel execution while offloading MoE weigh… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: Technical Report. 17 pages. 8 figures

  32. arXiv:2604.01557  [pdf, ps, other

    cs.DS cs.CC

    Sublinear-query relative-error testing of halfspaces

    Authors: Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, Tianqi Yang

    Abstract: The relative-error property testing model was introduced in [CDHLNSY24] to facilitate the study of property testing for "sparse" Boolean-valued functions, i.e. ones for which only a small fraction of all input assignments satisfy the function. In this framework, the distance from the unknown target function $f$ that is being tested to a function $g$ is defined as… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  33. arXiv:2604.01523  [pdf, ps, other

    cs.RO

    Robust Autonomous Control of a Magnetic Millirobot in In Vitro Cardiac Flow

    Authors: Anuruddha Bhattacharjee, Xinhao Chen, Lamar O. Mair, Suraj Raval, Yancy Diaz-Mercado, Axel Krieger

    Abstract: Untethered magnetic millirobots offer significant potential for minimally invasive cardiac therapies; however, achieving reliable autonomous control in pulsatile cardiac flow remains challenging. This work presents a vision-guided control framework enabling precise autonomous navigation of a magnetic millirobot in an in vitro heart phantom under physiologically relevant flow conditions. The system… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  34. arXiv:2604.01520  [pdf, ps, other

    cs.AI

    LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation

    Authors: Lei Wang, Yuanzi Li, Jinchao Wu, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen

    Abstract: Traditional social science research often requires designing complex experiments across vast methodological spaces and depends on real human participants, making it labor-intensive, costly, and difficult to scale. Here we present S-Researcher, an LLM-agent-based platform that assists researchers in conducting social science research more efficiently and at greater scale by "siliconizing" both the… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  35. arXiv:2604.01155  [pdf, ps, other

    cs.SD

    FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining

    Authors: Xiquan Li, Xuenan Xu, Ziyang Ma, Wenxi Chen, Haolin He, Qiuqiang Kong, Xie Chen

    Abstract: Contrastively pretrained audio-language models (e.g., CLAP) excel at clip-level understanding but struggle with frame-level tasks. Existing extensions fail to exploit the varying granularity of real-world audio-text data, where massive clip-level textual descriptions coexist with limited frame-level annotations. This paper proposes Fine-grained Language-Audio Pretraining (FineLAP), a novel trainin… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  36. arXiv:2604.01130  [pdf, ps, other

    cs.LG cs.CV

    Toward Personalized Darts Training: A Data-Driven Framework Based on Skeleton-Based Biomechanical Analysis and Motion Modeling

    Authors: Zhantao Chen, Dongyi He, Jin Fang, Xi Chen, Yishuo Liu, Xiaozhen Zhong, Xuejun Hu

    Abstract: As sports training becomes more data-driven, traditional dart coaching based mainly on experience and visual observation is increasingly inadequate for high-precision, goal-oriented movements. Although prior studies have highlighted the importance of release parameters, joint motion, and coordination in dart throwing, most quantitative methods still focus on local variables, single-release metrics… ▽ More

    Submitted 2 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

  37. arXiv:2604.00983  [pdf, ps, other

    cs.CV

    ACT Now: Preempting LVLM Hallucinations via Adaptive Context Integration

    Authors: Bei Yan, Yuecong Min, Jie Zhang, Shiguang Shan, Xilin Chen

    Abstract: Large Vision-Language Models (LVLMs) frequently suffer from severe hallucination issues. Existing mitigation strategies predominantly rely on isolated, single-step states to enhance visual focus or suppress strong linguistic priors. However, these static approaches neglect dynamic context changes across the generation process and struggles to correct inherited information loss. To address this lim… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  38. arXiv:2604.00818  [pdf, ps, other

    cs.CY

    Misconception Acquisition Dynamics in Large Language Models

    Authors: Naiming Liu, Xinghe Chen, Richard Baraniuk, Mrinmaya Sachan, Shashank Sonkar

    Abstract: Effective educational AI depends on modeling student misconceptions. Such models enable realistic learner simulation and diagnostic, adaptive tutoring. However, instruction-tuning large language models on student responses containing misconception errors can degrade reasoning abilities, creating a tension between faithful misconception modeling and preserving correct reasoning in other contexts. T… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  39. arXiv:2604.00701  [pdf, ps, other

    cs.NI

    Birdcast: Interest-aware BEV Multicasting for Infrastructure-assisted Collaborative Perception

    Authors: Yanan Ma, Zhengru Fang, Yihang Tao, Yu Guo, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Vehicle-to-infrastructure collaborative perception (V2I-CP) leverages a high-vantage node to transmit supplementary information, i.e., bird's-eye-view (BEV) feature maps, to vehicles, effectively overcoming line-of-sight limitations. However, the downlink V2I transmission introduces a significant communication bottleneck. Moreover, vehicles in V2I-CP require \textit{heterogeneous yet overlapping}… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  40. arXiv:2604.00643  [pdf, ps, other

    cs.HC

    In the Middle, Not on Top: AI-Mediated Communication for Patient-Provider Care Relationships

    Authors: Ut Gong, Yibo Meng, Qihan Zhang, Xin Chen, Yan Guan

    Abstract: Relationship-centered care relies on trust and meaningful connection. As AI enters clinical settings, we must ask not just what it can do, but how it should be positioned to support these values. We examine a "middle, not top" approach where AI mediates communication without usurping human judgment. Through studies of CLEAR, an asynchronous messaging system, we show how this configuration addresse… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 5 pages, 1 figure, Toward Relationship-Centered Care with AI: Designing for Human Connections in Healthcare workshop at CHI 2026

  41. arXiv:2604.00621  [pdf, ps, other

    cs.GT

    Heterogeneous Mean Field Game Framework for LEO Satellite-Assisted V2X Networks

    Authors: Kangkang Sun, Jianhua Li, Xiuzhen Chen, Mingzhe Chen, Minyi Guo

    Abstract: Coordinating mixed fleets of massive vehicles under stringent delay constraints is a central scalability bottleneck in next-generation mobile computing networks, especially when passenger cars, freight trucks, and autonomous vehicles share the same radio and multi-access edge computing (MEC) infrastructure. Heterogeneous mean field games (HMFG) are a principled framework for this setting, but a fu… ▽ More

    Submitted 6 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

    Comments: 18 pages, 7 figures, has been submitted to IEEE Transactions on Mobile Computing

    ACM Class: C.2.1; C.2.2; C.2.4

  42. arXiv:2604.00372  [pdf, ps, other

    cs.CV

    Dynamic Graph Neural Network with Adaptive Features Selection for RGB-D Based Indoor Scene Recognition

    Authors: Qiong Liu, Ruofei Xiong, Xingzhen Chen, Muyao Peng, You Yang

    Abstract: Multi-modality of color and depth, i.e., RGB-D, is of great importance in recent research of indoor scene recognition. In this kind of data representation, depth map is able to describe the 3D structure of scenes and geometric relations among objects. Previous works showed that local features of both modalities are vital for promotion of recognition accuracy. However, the problem of adaptive selec… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  43. arXiv:2604.00268  [pdf, ps, other

    cs.CC cs.DS

    The Mystery Deepens: On the Query Complexity of Tarski Fixed Points

    Authors: Xi Chen, Yuhao Li, Mihalis Yannakakis

    Abstract: We give an $O(\log^2 n)$-query algorithm for finding a Tarski fixed point over the $4$-dimensional lattice $[n]^4$, matching the $Ω(\log^2 n)$ lower bound of [EPRY20]. Additionally, our algorithm yields an ${O(\log^{\lceil (k-1)/3\rceil+1} n)}$-query algorithm for any constant $k$, improving the previous best upper bound ${O(\log^{\lceil (k-1)/2\rceil+1} n)}$ of [CL22]. Our algorithm uses a new… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  44. arXiv:2604.00161  [pdf, ps, other

    cs.CV

    Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

    Authors: Longwei Xu, Feng Feng, Shaojie Zhang, Xin Chen, Hang Li, Anan Du, Hailong Yu, Pei Fu, Zhenbo Luo, Jian Luan

    Abstract: Optical Character Recognition (OCR) is increasingly regarded as a foundational capability for modern vision-language models (VLMs), enabling them not only to read text in images but also to support downstream reasoning in real-world visual question answering (VQA). However, practical applications further require reliable text anchors, i.e., accurately grounding queried text to its corresponding sp… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  45. arXiv:2603.29506  [pdf, ps, other

    cs.GT

    Hierarchical Battery-Aware Game Algorithm for ISL Power Allocation in LEO Mega-Constellations

    Authors: Kangkang Sun, Jianhua Li, Xiuzhen Chen, Minyi Guo

    Abstract: Sustaining high inter-satellite link (ISL) throughput under intermittent solar harvesting is a fundamental challenge for LEO mega-constellations. Existing frameworks impose static power ceilings that ignore real-time battery state and comprehensive onboard power budgets, causing eclipse-period energy crises. Learning-based approaches capture battery dynamics but lack equilibrium guarantees and do… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: 19 pages, 4 figures, has submitted to IEEE Transactions on Mobile Computing

    ACM Class: C.2.1; C.2.2; C.2.4

  46. arXiv:2603.29252  [pdf, ps, other

    cs.CV cs.AI

    Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism

    Authors: Tao Chen, Kun Zhang, Qiong Wu, Xiao Chen, Chao Chang, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji

    Abstract: Long video understanding is a key challenge that plagues the advancement of \emph{Multimodal Large language Models} (MLLMs). In this paper, we study this problem from the perspective of visual memory mechanism, and proposed a novel and training-free approach, termed \emph{Flexible Memory} (\textbf{FlexMem}). In principle, FlexMem aims to mimic human behavior of video watching, \emph{i.e.}, continu… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: CVPR 2026

  47. arXiv:2603.28776  [pdf, ps, other

    cs.CV cs.AI cs.LG

    DF-ACBlurGAN: Structure-Aware Conditional Generation of Internally Repeated Patterns for Biomaterial Microtopography Design

    Authors: Rongjun Dong, Xin Chen, Morgan R Alexander, Karthikeyan Sivakumar, Reza Omdivar, David A Winkler, Grazziela Figueredo

    Abstract: Learning to generate images with internally repeated and periodic structures poses a fundamental challenge for machine learning and computer vision models, which are typically optimised for local texture statistics and semantic realism rather than global structural consistency. This limitation is particularly pronounced in applications requiring strict control over repetition scale, spacing, and b… ▽ More

    Submitted 4 February, 2026; originally announced March 2026.

  48. arXiv:2603.28757  [pdf, ps, other

    cs.CV cs.MM cs.SD

    SonoWorld: From One Image to a 3D Audio-Visual Scene

    Authors: Derong Jin, Xiyi Chen, Ming C. Lin, Ruohan Gao

    Abstract: Tremendous progress in visual scene generation now turns a single image into an explorable 3D world, yet immersion remains incomplete without sound. We introduce Image2AVScene, the task of generating a 3D audio-visual scene from a single image, and present SonoWorld, the first framework to tackle this challenge. From one image, our pipeline outpaints a 360° panorama, lifts it into a navigable 3D s… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026, project page: https://humathe.github.io/sonoworld/

  49. arXiv:2603.28568  [pdf, ps, other

    cs.CV

    XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

    Authors: Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long

    Abstract: Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual perturbations can propagate through the shared embedding space and cause correlated se… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  50. arXiv:2603.28493  [pdf, ps, other

    cs.CV

    ConceptWeaver: Weaving Disentangled Concepts with Flow

    Authors: Jintao Chen, Aiming Hao, Xiaoqing Chen, Chengyu Bai, Chubin Chen, Yanxun Li, Jiahong Wu, Xiangxiang Chu, Shanghang Zhang

    Abstract: Pre-trained flow-based models excel at synthesizing complex scenes yet lack a direct mechanism for disentangling and customizing their underlying concepts from one-shot real-world sources. To demystify this process, we first introduce a novel differential probing technique to isolate and analyze the influence of individual concept tokens on the velocity field over time. This investigation yields a… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.