Skip to main content

Showing 1–50 of 1,202 results for author: Jin, Z

.
  1. arXiv:2604.11721  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Evaluating Cooperation in LLM Social Groups through Elected Leadership

    Authors: Ryan Faulkner, Anushka Deshpande, David Guzman Piedrahita, Joel Z. Leibo, Zhijing Jin

    Abstract: Governing common-pool resources requires agents to develop enduring strategies through cooperation and self-governance to avoid collective failure. While foundation models have shown potential for cooperation in these settings, existing multi-agent research provides little insight into whether structured leadership and election mechanisms can improve collective decision making. The lack of such a… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Main text: 11 pages, 4 figures, 4 tables

  2. arXiv:2604.10158  [pdf, ps, other

    cs.LG

    Tracing the Thought of a Grandmaster-level Chess-Playing Transformer

    Authors: Rui Lin, Zhenyu Jin, Guancheng Zhou, Xuyang Ge, Wentao Shu, Jiaxing Wu, Junxuan Wang, Zhengfu He, Junping Zhang, Xipeng Qiu

    Abstract: While modern transformer neural networks achieve grandmaster-level performance in chess and other reasoning tasks, their internal computation process remains largely opaque. Focusing on Leela Chess Zero (LC0), we introduce a sparse decomposition framework to interpret its internal computation by decomposing its MLP and attention modules with sparse replacement layers, which capture the primary com… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

  3. arXiv:2604.09162  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events

    Authors: Yuqin Yang, Haowu Zhou, Haoran Tu, Zhiwen Hui, Shiqi Yan, HaoYang Li, Dong She, Xianrong Yao, Yang Gao, Zhanpeng Jin

    Abstract: Most affective computing research treats emotion as a static property of text, focusing on the writer's sentiment while overlooking the reader's perspective. This approach ignores how individual personalities lead to diverse emotional appraisals of the same event. Although role-playing Large Language Models (LLMs) attempt to simulate such nuanced reactions, they often suffer from "personality illu… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

    Comments: Accepted by ACL 2026 Main

  4. arXiv:2604.07789  [pdf, ps, other

    cs.MA cs.CL cs.SE

    ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

    Authors: Kenan Li, Qirui Jin, Liao Zhu, Xiaosong Huang, Yijia Wu, Yikai Zhang, Xin Zhang, Zijian Jin, Yufan Huang, Elsie Nallipogu, Chaoyun Zhang, Yu Kang, Saravan Rajmohan, Qingwei Lin, Wenke Lee, Dongmei Zhang

    Abstract: Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, t… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: Under peer review; 37 pages, 10 figures, 5 tables

    ACM Class: I.2.7; I.2.5

  5. arXiv:2604.07767  [pdf, ps, other

    cs.DC

    Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

    Authors: Senyao Li, Zhigang Zuo, Haozhao Wang, Junyu Chen, Zhanbo Jin, Ruixuan LI

    Abstract: Collaborative edge-cloud frameworks have emerged as the main- stream paradigm for mobile automation, mitigating the latency and privacy risks inherent to monolithic cloud agents. However, existing approaches centralize administration in the cloud while relegating the device to passive execution, inducing a cognitive lag regard- ing real-time UI dynamics. To tackle this, we introduce AdecPilot by a… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  6. arXiv:2604.07086  [pdf, ps, other

    eess.SP

    Radio-Frequency Inverse Rendering for Wireless Environment Modeling

    Authors: Fuhai Wang, Zihan Jin, Lehang Wang, Xuehui Dong, Tiebin Mi, Robert Caiming Qiu, Zenan ling

    Abstract: Neural rendering paradigms have recently emerged as powerful tools for radio frequency (RF). However, by entangling RF sources with scene geometry and material properties, existing approaches limit downstream manipulation of scene geometry, wireless system configuration, and RF reasoning. To address this, we propose a physically grounded RF inverse rendering (RFIR) framework that explicitly decoup… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  7. arXiv:2604.07023  [pdf, ps, other

    cs.CL

    MARS: Enabling Autoregressive Models Multi-Token Generation

    Authors: Ziqi Jin, Lei Wang, Ziwei Luo, Aixin Sun

    Abstract: Autoregressive (AR) language models generate text one token at a time, even when consecutive tokens are highly predictable given earlier context. We introduce MARS (Mask AutoRegreSsion), a lightweight fine-tuning method that teaches an instruction-tuned AR model to predict multiple tokens per forward pass. MARS adds no architectural modifications, no extra parameters, and produces a single model t… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 15 pages, 4 fugures

  8. arXiv:2604.06782  [pdf, ps, other

    cs.CV

    EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling

    Authors: Qingguo Meng, Xingbo Dong, Zhe Jin, Massimo Tistarelli

    Abstract: Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representatio… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  9. arXiv:2604.06772  [pdf, ps, other

    astro-ph.HE

    Neutron Star Merger Rates from Multi-messenger Observations: Clues to the Physical Origin of the Short and Long-short Gamma-ray Bursts

    Authors: Zhi-Ping Jin, Yuan-Zhu Wang, Yin-Jie Li, Yun Wang, Hao Wang, Shao-Peng Tang, Da-Ming Wei

    Abstract: Short and long-short gamma-ray bursts (GRBs) are widely believed to be powered by neutron star mergers. In this work, we calculate local rate of such GRBs and find a relatively high value of $\sim 786-2468~{\rm Gpc^{-3}~yr^{-1}}$ when including the very narrow collimation event GRB 061201. Considering that its redshift is not very reliable, after excluding this event, the rate is… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 8 pages, 2 figures, 2 tables

  10. arXiv:2604.05900  [pdf, ps, other

    cs.CV

    AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis

    Authors: Dong She, Xianrong Yao, Liqun Chen, Jinghe Yu, Yang Gao, Zhanpeng Jin

    Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA), which integrates perception, reasoning, and generation into a unified framework, remains underexplored. To address this gap, we introduce AICA-Bench, a comprehensive benchmark with three core tasks: Emotion Understanding (EU), Emotion Reasoning (ER), and Emotion-… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: Accepted by Findings of ACL 2026

  11. arXiv:2604.04788  [pdf, ps, other

    cs.CY

    From Hallucination to Scheming: A Unified Taxonomy and Benchmark Analysis for LLM Deception

    Authors: Jerick Shi, Terry Jingcheng Zhang, Zhijing Jin, Vincent Conitzer

    Abstract: Large language models (LLMs) produce systematically misleading outputs, from hallucinated citations to strategic deception of evaluators, yet these phenomena are studied by separate communities with incompatible terminology. We propose a unified taxonomy organized along three complementary dimensions: degree of goal-directedness (behavioral to strategic deception), object of deception, and mechani… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: Accepted to ICLR Agents in the Wild: Safety, Security, and Beyond Workshop

  12. arXiv:2604.04782  [pdf, ps, other

    cs.CY

    Cheap Talk, Empty Promise: Frontier LLMs easily break public promises for self-interest

    Authors: Jerick Shi, Terry Jingcheng Zhang, Zhijing Jin, Vincent Conitzer

    Abstract: Large language models are increasingly deployed as autonomous agents in multi-agent settings where they communicate intentions and take consequential actions with limited human oversight. A critical safety question is whether agents that publicly commit to actions break those promises when they can privately deviate, and what the consequences are for both themselves and the collective. We study de… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: Accepted to ICLR AI for Mechanism Design and Strategic Decision Making Workshop

  13. arXiv:2604.04771  [pdf, ps, other

    cs.CV cs.CL

    MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

    Authors: Bin Wang, Tianyao He, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Tao Chu, Yuan Qu, Zhenjiang Jin, Weijun Zeng, Ziyang Miao, Bangrui Xu, Junbo Niu, Mengzhang Cai, Jiantao Qiu, Qintong Zhang, Dongsheng Ma, Yuefeng Sun, Hejun Dong, Wenzheng Zhang, Jutao Xiao, Jiayong Shi, Pengyu Liao, Xiaomeng Zhao, Huaping Zhong, Liqun Wei , et al. (18 additional authors not shown)

    Abstract: Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training… ▽ More

    Submitted 9 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

    Comments: Technical Report

  14. arXiv:2604.03633  [pdf, ps, other

    quant-ph gr-qc

    Nonlocal advantage of quantum imaginarity in Schwarzchild spacetime

    Authors: Bing Yu, Xiao-Yong Yang, Xiaoli Hu, Zhi-Xiang Jin, Xiaofen Huang

    Abstract: Black hole spacetimes provide a natural setting for quantum systems in curved spacetime, where effects such as Hawking radiation arise from event horizons. In this work, we investigate the impact of the Hawking effect on quantum imaginarity in Schwarzschild spacetime, focusing on nonlocal advantage of quantum imaginarity (NAQI) and assisted imaginarity distillation. For NAQI, it is significantly a… ▽ More

    Submitted 14 April, 2026; v1 submitted 4 April, 2026; originally announced April 2026.

    Comments: 8 pages, 24 figures

  15. arXiv:2604.02778  [pdf, ps, other

    cs.CL

    When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs

    Authors: Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi

    Abstract: Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge graph reasoning (MMKGR) methods, however, usually assume static graphs and suffer ca… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  16. arXiv:2604.02764  [pdf, ps, other

    cs.CV

    InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging

    Authors: Leyang Jin, Zirong Jin, Zisheng Ye, Haokai Pang, Xiaoguang Han, Yujian Zheng, Hao Li

    Abstract: Recovering sewing patterns from draped 3D garments is a challenging problem in human digitization research. In contrast to the well-studied forward process of draping designed sewing patterns using mature physical simulation engines, the inverse process of recovering parametric 2D patterns from deformed garment geometry remains fundamentally ill-posed for existing methods. We propose a two-stage f… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 13 pages, 13 figures

  17. arXiv:2604.02709  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.SE

    Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

    Authors: Yihong Dong, Jianha Xiao, Xue Jiang, Xuyuan Guo, Zhiyuan Fan, Jiaru Qian, Kechi Zhang, Jia Li, Zhi Jin, Ge Li

    Abstract: The formal reasoning capabilities of LLMs are crucial for advancing automated software engineering. However, existing benchmarks for LLMs lack systematic evaluation based on computation and complexity, leaving a critical gap in understanding their formal reasoning capabilities. Therefore, it is still unknown whether SOTA LLMs can grasp the structured, hierarchical complexity of formal languages as… ▽ More

    Submitted 15 April, 2026; v1 submitted 3 April, 2026; originally announced April 2026.

    Comments: Work in progress

  18. arXiv:2604.02635  [pdf, ps, other

    quant-ph

    From Liouville equation to universal quantum control: A study of generating ultra highly squeezed states

    Authors: Zhu-yao Jin, J. Q. You, Jun Jing

    Abstract: Within a unified framework, we reveal that the seemingly disparate control approaches for classical and quantum continuous-variable systems are interconnected via differential manifolds of the ancillary representations. For classical systems, the ancillary representation is defined by the time-dependent ancillary canonical variables resulting from a symplectic transformation over the original cano… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: 5+17 pages, 2 figures

  19. arXiv:2604.02452  [pdf, ps, other

    physics.space-ph astro-ph.EP astro-ph.SR physics.geo-ph physics.plasm-ph

    Proton Temperature Anisotropy Across Interplanetary Shocks: A Statistical Analysis with WIND observations

    Authors: Zeping Jin, Lingling Zhao, Xingyu Zhu, Vladimir Flosinski, Gary P. Zank, Jakobus Le Roux, Yiming Jiao, Ashok Silwal, Nibuna S. M. Subashchandar

    Abstract: Interplanetary (IP) shocks efficiently modify the proton temperature anisotropy of the solar wind. Analyzing ~800 IP shocks observed by the Wind spacecraft from 1997-2024, we present a statistical study of upstream and downstream proton temperature anisotropy and its dependence on shock geometry, compression, and distance from the shock. We find that (1) quasi-perpendicular shocks produce a pronou… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  20. arXiv:2604.00886  [pdf, ps, other

    cs.CV cs.AI cs.CL

    PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding

    Authors: Nan Wang, Zhiwei Jin, Chen Chen, Haonan Lu

    Abstract: Document understanding and GUI interaction are among the highest-value applications of Vision-Language Models (VLMs), yet they impose exceptionally heavy computational burden: fine-grained text and small UI elements demand high-resolution inputs that produce tens of thousands of visual tokens. We observe that this cost is largely wasteful -- across document and GUI benchmarks, only 22--71\% of ima… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  21. arXiv:2604.00754  [pdf, ps, other

    cs.CL cs.LG

    Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

    Authors: Zehao Jin, Yanan Sui

    Abstract: The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  22. arXiv:2604.00057  [pdf, ps, other

    cs.MM cs.AI

    Towards Automatic Soccer Commentary Generation with Knowledge-Enhanced Visual Reasoning

    Authors: Zeyu Jin, Xiaoyu Qin, Songtao Zhou, Kaifeng Yun, Jia Jia

    Abstract: Soccer commentary plays a crucial role in enhancing the soccer game viewing experience for audiences. Previous studies in automatic soccer commentary generation typically adopt an end-to-end method to generate anonymous live text commentary. Such generated commentary is insufficient in the context of real-world live televised commentary, as it contains anonymous entities, context-dependent errors… ▽ More

    Submitted 30 March, 2026; originally announced April 2026.

    Comments: Accepted by ICME 2026

  23. arXiv:2603.29957  [pdf, ps, other

    cs.SE cs.LG

    Think Anywhere in Code Generation

    Authors: Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong

    Abstract: Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort… ▽ More

    Submitted 2 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

  24. arXiv:2603.29162  [pdf, ps, other

    cs.MM

    From Natural Alignment to Conditional Controllability in Multimodal Dialogue

    Authors: Zeyu Jin, Songtao Zhou, Haoyu Wang, Minghao Tian, Kaifeng Yun, Zhuo Chen, Xiaoyu Qin, Jia Jia

    Abstract: The recent advancement of Artificial Intelligence Generated Content (AIGC) has led to significant strides in modeling human interaction, particularly in the context of multimodal dialogue. While current methods impressively generate realistic dialogue in isolated modalities like speech or vision, challenges remain in controllable Multimodal Dialogue Generation (MDG). This paper focuses on the natu… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: Accepted by ICLR 2026

  25. arXiv:2603.28699  [pdf, ps, other

    astro-ph.HE

    An Intertwined Short and Long GRB with 4-minute Separation

    Authors: Liang Li, Yu Wang, Bing Zhang, Ye Li, Shu-Rui Zhang, Jochen Greiner, Zhi-Ping Jin, Jin-Jun Geng, Hou-Jun Lv, Asaf Peer, Maria Dainotti, Tong Liu, Yi-Zhong Fan, Yong-Feng Huang, Zi-Gao Dai, Melin Kole, Wei-Hua Lei, Ye-Fei Yuan, Shuang-Nan Zhang, Felix Ryde, She-Sheng Xue, Rong-Gen Cai

    Abstract: Gamma-ray bursts (GRBs), the most energetic transients in the Universe, are traditionally classified into long-duration ($T_{90}>2$ s) and short-duration ($T_{90}<2$ s) events, associated with the core collapse of massive stars (Type II) and the merger of compact binary systems (Type I), respectively. The two classes exhibit distinct observational properties that serve as key diagnostic criteria f… ▽ More

    Submitted 3 April, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

    Comments: 56 pages, 10 figures (including 43 panels), 9 tables

  26. arXiv:2603.28407  [pdf, ps, other

    cs.AI cs.CL

    MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

    Authors: Fangda Ye, Yuxin Hu, Pengxiang Zhu, Yibo Li, Ziqi Jin, Yao Xiao, Yibo Wang, Lei Wang, Zhen Zhang, Lu Wang, Yue Deng, Bin Wang, Yifan Zhang, Liangcai Su, Xinyu Wang, He Zhao, Chen Wei, Qiang Ren, Bryan Hooi, An Bo, Shuicheng Yan, Lidong Bing

    Abstract: Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed rubrics, failing to evaluate the underlying research process. Most also offer limited multimodal coverage, rely on synthetic tasks that do not reflect real-world query complexity, and cannot be refreshed as knowledge evol… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: GitHub: https://github.com/MiroMindAI/MiroEval

  27. arXiv:2603.25607  [pdf

    cs.CV cs.AI

    DeepFAN, a transformer-based deep learning model for human-artificial intelligence collaborative assessment of incidental pulmonary nodules in CT scans: a multi-reader, multi-case trial

    Authors: Zhenchen Zhu, Ge Hu, Weixiong Tan, Kai Gao, Chao Sun, Zhen Zhou, Kepei Xu, Wei Han, Meixia Shang, Xiaoming Qiu, Yiqing Tan, Jinhua Wang, Zhoumeng Ying, Li Peng, Wei Song, Lan Song, Zhengyu Jin, Nan Hong, Yizhou Yu

    Abstract: The widespread adoption of CT has notably increased the number of detected lung nodules. However, current deep learning methods for classifying benign and malignant nodules often fail to comprehensively integrate global and local features, and most of them have not been validated through clinical trials. To address this, we developed DeepFAN, a transformer-based model trained on over 10K pathology… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 28 pages for main text and 37 pages for supplementary information, 7 figures in main text and 9 figures in supplementary information

  28. arXiv:2603.25040  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

    Authors: Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang , et al. (152 additional authors not shown)

    Abstract: We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis… ▽ More

    Submitted 2 April, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

  29. arXiv:2603.25020  [pdf, ps, other

    cs.CV

    GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization

    Authors: Zhangyu Jin, Maksim Siniukov, Deuksin Kwon, Ashutosh Chaubey, Mohammad Soleymani

    Abstract: Generating realistic 3D head motion for dyadic interactions is a significant challenge in virtual human synthesis. While recent methods achieve impressive results with speaking heads, they frequently suffer from the `Regression-to-the-Mean' problem in listener motions, collapsing into static faces, and lack the parameter space for complex nonverbal motions. In this paper, we propose GDPO-Listener,… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  30. arXiv:2603.21014  [pdf, ps, other

    cs.LG cs.CL

    CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

    Authors: Florent Draye, Abir Harrasse, Vedant Palit, Tung-Yu Wu, Jiarui Liu, Punya Syon Pandey, Roderick Wu, Terry Jingchen Zhang, Zhijing Jin, Bernhard Schölkopf

    Abstract: Mechanistic interpretability seeks to understand how Large Language Models (LLMs) represent and process information. Recent approaches based on dictionary learning and transcoders enable representing model computation in terms of sparse, interpretable features and their interactions, giving rise to feature attribution graphs. However, these graphs are often large and redundant, limiting their inte… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 9 pages, 2 figures, code: https://github.com/LLM-Interp/CLT-Forge

  31. arXiv:2603.19169  [pdf, ps, other

    cs.CV cs.AI

    ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis

    Authors: Zhan Jin, Yu Luo, Yizhou Zhang, Ziyang Cui, Yuqing Wei, Xianchao Liu, Xueying Zeng, Qing Zhang

    Abstract: Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 28 pages, 5 figures . arXiv:submit/7385738 [cs.AI]

  32. arXiv:2603.18815  [pdf, ps, other

    cs.AI

    ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

    Authors: Hao Zhang, Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu, Jan Kautz, Yi Dong

    Abstract: Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and mai… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  33. arXiv:2603.17593  [pdf

    eess.SY physics.acc-ph

    An Extended T-A Formulation Based on Potential-Chain Recursion for Electromagnetic Modeling of Parallel-Wound No-Insulation HTS Coils

    Authors: Zhe Pan, Qi Xu, Ruixiang Wang, Zhenghao Jin, Jianzhao Geng

    Abstract: Parallel-wound no-insulation (PW-NI) high-temperature superconducting (HTS) coils significantly reduce charging delay while maintaining excellent self-protection capability, demonstrating great potential for high-field applications. Existing models that couple the T-A formulation with equivalent circuits have demonstrated high accuracy in electromagnetic analysis of PW-NI coils. However, eliminati… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  34. arXiv:2603.16936  [pdf, ps, other

    cs.CV cs.AI

    TDMM-LM: Bridging Facial Understanding and Animation via Language Models

    Authors: Luchuan Song, Pinxin Liu, Haiyang Liu, Zhenchao Jin, Yolo Yunlong Tang, Zichong Xu, Susan Liang, Jing Bi, Jason J Corso, Chenliang Xu

    Abstract: Text-guided human body animation has advanced rapidly, yet facial animation lags due to the scarcity of well-annotated, text-paired facial corpora. To close this gap, we leverage foundation generative models to synthesize a large, balanced corpus of facial behavior. We design prompts suite covering emotions and head motions, generate about 80 hours of facial videos with multiple generators, and fi… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

    Comments: 12 pages, 13 figures

  35. arXiv:2603.16710  [pdf, ps, other

    math.OC

    Design of Transit Networks: Global Optimization of Continuous Approximation Models via Geometric Programming

    Authors: Haoyang Mao, Weihua Gu, Wenbo Fan, Zhicheng Jin, Xiaokuan Zhao

    Abstract: Continuous approximation (CA) models have been widely adopted in transit network design studies due to their strong analytical tractability and high computational efficiency. However, such models are typically formulated as nonconvex optimization problems, and existing solution approaches mainly rely on iterative algorithms that exploit first-order optimality information or nonlinear programming s… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  36. arXiv:2603.15853  [pdf, ps, other

    cond-mat.supr-con cond-mat.dis-nn quant-ph

    Taming the expressiveness of neural-network wave functions for robust convergence to quantum many-body states

    Authors: Dezhe Z. Jin

    Abstract: Neural networks are emerging as a powerful tool for determining the quantum states of interacting many-body fermionic systems. The standard approach trains a neural-network ansatz by minimizing the mean local energy estimated from Monte Carlo samples. However, this typically results in large sample-to-sample fluctuations in the estimated mean energy and thus slow convergence of the energy minimiza… ▽ More

    Submitted 31 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  37. arXiv:2603.13995  [pdf, ps, other

    physics.chem-ph cond-mat.mtrl-sci

    Systematically Improvable Numerical Atomic Orbital Basis Using Contracted Truncated Spherical Waves

    Authors: Yike Huang, Zuxin Jin, Linfeng Zhang, Mohan Chen, Rui Chen, Ling Li

    Abstract: To solve the Kohn-Sham equation within the framework of density functional theory, we develop a scheme to construct numerical atomic orbital (NAO) basis sets by contracting truncated spherical waves (TSWs). The contraction minimizes the trace of the kinetic operator in the residual space, generalizing the spillage minimizing scheme [M. Chen et al., J. Phys. Condens. Matter 22, 445501 (2010); P. Li… ▽ More

    Submitted 9 April, 2026; v1 submitted 14 March, 2026; originally announced March 2026.

  38. arXiv:2603.13126  [pdf, ps, other

    q-bio.NC cs.AI

    Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

    Authors: Zhiye Jin, Yibai Li, K. D. Joshi, Xuefei, Deng, Xiaobing, Li

    Abstract: This study presents the development of the PsyCogMetrics AI Lab (psycogmetrics.ai), an integrated, cloud-based platform that operationalizes psychometric and cognitive-science methodologies for Large Language Model (LLM) evaluation. Framed as a three-cycle Action Design Science study, the Relevance Cycle identifies key limitations in current evaluation methods and unfulfilled stakeholder needs. Th… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: 10 pages. Prepared: April 2025; submitted: June 15, 2025; accepted: August 2025. In: Proceedings of the 59th Hawaii International Conference on System Sciences (HICSS 2026), January 2026

    ACM Class: H.1.2; I.2.7; K.4.2

    Journal ref: Proceedings of the 59th Hawaii International Conference on System Sciences (HICSS), January 2026, pp. 6952-6961

  39. arXiv:2603.11896  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

    Authors: Lu Wang, Zhuoran Jin, Yupu Hao, Yubo Chen, Kang Liu, Yulong Ao, Jun Zhao

    Abstract: Multimodal large language models (MLLMs) have shown strong performance on offline video understanding, but most are limited to offline inference or have weak online reasoning, making multi-turn interaction over continuously arriving video streams difficult. Existing streaming methods typically use an interleaved perception-generation paradigm, which prevents concurrent perception and generation an… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  40. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

    Authors: Yibai Li, Zhiye Jin, Xiaobing, Li, K. D. Joshi, Xuefei, Deng

    Abstract: This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralization and decentralization, respectively. While AI, particularly with the rise of large language models (LLMs), exhibits a strong centralizing force due to data and resource monopolization by large corporations, blockchain offers… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 7 pages, Editorial, published in ACM SIGMIS Database Vol. 56, Iss. 2

    ACM Class: H.4.m; K.4.4; I.2.11; C.2.4

    Journal ref: ACM SIGMIS Database: the DATABASE for Advances in Information Systems, Vol. 56, No. 2, pp. 6-12, April 2025

  41. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities

    Authors: Yibai Li, Xiaolin Lin, Zhenghui Sha, Zhiye Jin, Xiaobing Li

    Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, which also makes them opaque ``black box'' systems that are challenging to evaluate and interpret. AI Psychometrics is an emerging field that aims to tackle these challenges by applying psychometric methodologies to evaluate and interpret the psychological traits and pr… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: Accepted for publication in the Proceedings of the 58th Hawaii International Conference on System Sciences (HICSS), 2025

    MSC Class: 68T50; 62P15; 91E45 ACM Class: I.2.7; I.2.0; J.4; H.1.2

    Journal ref: Proceedings of the 58th Hawaii International Conference on System Sciences (HICSS), January 2025, pp. 5189-5197

  42. arXiv:2603.08254  [pdf, ps, other

    cs.CV

    DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving

    Authors: Zhuolin He, Jing Li, Guanghao Li, Xiaolei Chen, Jiacheng Tang, Siyang Zhang, Zhounan Jin, Feipeng Cai, Bin Li, Jian Pu, Jia Cai, Xiangyang Xue

    Abstract: Dynamic scene reconstruction in autonomous driving remains a fundamental challenge due to significant temporal variations, moving objects, and complex scene dynamics. Existing feed-forward 3D models have demonstrated strong performance in static reconstruction but still struggle to capture dynamic motion. To address these limitations, we propose DynamicVGGT, a unified feed-forward framework that e… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  43. arXiv:2603.07889  [pdf, ps, other

    cs.CV

    Structure and Progress Aware Diffusion for Medical Image Segmentation

    Authors: Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang

    Abstract: Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous a… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  44. arXiv:2603.07696  [pdf, ps, other

    eess.AS

    Multi-View Based Audio Visual Target Speaker Extraction

    Authors: Peijun Yang, Zhan Jin, Juan Liu, Ming Li

    Abstract: Audio-Visual Target Speaker Extraction (AVTSE) aims to separate a target speaker's voice from a mixed audio signal using the corresponding visual cues. While most existing AVTSE methods rely exclusively on frontal-view videos, this limitation restricts their robustness in real-world scenarios where non-frontal views are prevalent. Such visual perspectives often contain complementary articulatory i… ▽ More

    Submitted 10 March, 2026; v1 submitted 8 March, 2026; originally announced March 2026.

    Comments: submitted to Interspeech 2026

  45. arXiv:2603.06280  [pdf, ps, other

    cs.RO

    SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

    Authors: Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Zhu Jin, Lu Fang

    Abstract: High-quality, long-horizon demonstrations are essential for embodied AI, yet acquiring such data for tightly coupled wheeled mobile manipulators remains a fundamental bottleneck. Unlike fixed-base systems, mobile manipulators require continuous coordination between $SE(2)$ locomotion and precise manipulation, exposing limitations in existing teleoperation and wearable interfaces. We present \textb… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  46. arXiv:2603.05026  [pdf, ps, other

    cs.SE cs.LG cs.MA

    RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform

    Authors: Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin, Liao Zhu, Xiaosong Huang, Geng Zhang, Yikai Zhang, Shilin He, Chengxing Xie, Xin Zhang, Zijian Jin, Bowen Li, Chaoyun Zhang, Yu Kang, Yufan Huang, Elsie Nallipogu, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

    Abstract: Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: Under peer review. 16 pages, 4 figures, 5 tables

    ACM Class: I.2.5; I.2.7

  47. arXiv:2603.04217  [pdf, ps, other

    cs.CL

    When Do Language Models Endorse Limitations on Human Rights Principles?

    Authors: Keenan Samway, Nicole Miu Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin

    Abstract: As Large Language Models (LLMs) increasingly mediate global information access with the potential to shape public discourse, their alignment with universal human rights principles becomes important to ensure that these rights are abided by in high stakes AI-mediated interactions. In this paper, we evaluate how LLMs navigate trade-offs involving the Universal Declaration of Human Rights (UDHR), lev… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: EACL Findings 2026

  48. arXiv:2603.03180  [pdf

    cs.SE cs.AI cs.CL

    Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling

    Authors: Y. Zhong, R. Huang, M. Wang, Z. Guo, YC. Li, M. Yu, Z. Jin

    Abstract: Automated industrial optimization modeling requires reliable translation of natural-language requirements into solver-executable code. However, large language models often generate non-compilable models due to missing declarations, type inconsistencies, and incomplete dependency contexts. We propose a type-aware retrieval-augmented generation (RAG) method that enforces modeling entity types and mi… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  49. arXiv:2603.02275  [pdf, ps, other

    cs.LG stat.AP stat.ML

    A Comparative Study of UMAP and Other Dimensionality Reduction Methods

    Authors: Guanzhe Zhang, Shanshan Ding, Zhezhen Jin

    Abstract: Uniform Manifold Approximation and Projection (UMAP) is a widely used manifold learning technique for dimensionality reduction. This paper studies UMAP, supervised UMAP, and several competing dimensionality reduction methods, including Principal Component Analysis (PCA), Kernel PCA, Sliced Inverse Regression (SIR), Kernel SIR, and t-distributed Stochastic Neighbor Embedding, through a comprehensiv… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

    Comments: 31 pages, 4 figures

    MSC Class: 62H25; 62J05; 68T09

  50. arXiv:2603.02024  [pdf, ps, other

    cs.CL cs.AI cs.CV

    MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

    Authors: Jiachun Li, Shaoping Huang, Zhuoran Jin, Chenlong Zhang, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Recent progress in the reasoning capabilities of multimodal large language models (MLLMs) has empowered them to address more complex tasks such as scientific analysis and mathematical reasoning. Despite their promise, MLLMs' reasoning abilities across different scenarios in real life remain largely unexplored and lack standardized benchmarks for evaluation. To address this gap, we introduce MMR-Li… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: Accepted by ICLR 2026, 78 pages, 60 figures