Skip to main content

Showing 1–50 of 2,032 results for author: Lin, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.13579  [pdf, ps, other

    cs.CL

    MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

    Authors: Jiahang Lin, Kai Hu, Binghai Wang, Yuhao Zhou, Zhiheng Xi, Honglin Guo, Shichun Liu, Junzhe Wang, Shihan Dou, Enyu Zhou, Hang Yan, Zhenhua Han, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Conventional Retrieval-Augmented Generation (RAG) systems often struggle with complex multi-hop queries over long documents due to their single-pass retrieval. We introduce MM-Doc-R1, a novel framework that employs an agentic, vision-aware workflow to address long document visual question answering through iterative information discovery and synthesis. To incentivize the information seeking capabi… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

  2. arXiv:2604.11804  [pdf, ps, other

    cs.CV

    OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

    Authors: Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng

    Abstract: In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commerce demonstrations, short video production, and interactive entertainment. Howeve… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Project page: https://correr-zhou.github.io/OmniShow/

  3. arXiv:2604.11789  [pdf, ps, other

    cs.CV

    LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

    Authors: Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi

    Abstract: Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify the correct instance, preserve object identity across interactions, and localize… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: 38 pages, 6 figures

  4. arXiv:2604.10910  [pdf, ps, other

    cs.CV

    STGV: Spatio-Temporal Hash Encoding for Gaussian-based Video Representation

    Authors: Jierun Lin, Jiacong Chen, Qingyu Mao, Shuai Liu, Xiandong Meng, Fanyang Meng, Yongsheng Liang

    Abstract: 2D Gaussian Splatting (2DGS) has recently become a promising paradigm for high-quality video representation. However, existing methods employ content-agnostic or spatio-temporal feature overlapping embeddings to predict canonical Gaussian primitive deformations, which entangles static and dynamic components in videos and prevents modeling their distinct properties effectively. These result in inac… ▽ More

    Submitted 13 April, 2026; v1 submitted 12 April, 2026; originally announced April 2026.

  5. arXiv:2604.10755  [pdf, ps, other

    cs.CV

    MMRareBench: A Rare-Disease Multimodal and Multi-Image Medical Benchmark

    Authors: Junzhi Ning, Jiashi Lin, Yingying Fang, Wei Li, Jiyao Liu, Cheng Tang, Chenglong Ma, Wenhao Tang, Tianbin Li, Ziyan Huang, Guang Yang, Junjun He

    Abstract: Multimodal large language models (MLLMs) have advanced clinical tasks for common conditions, but their performance on rare diseases remains largely untested. In rare-disease scenarios, clinicians often lack prior clinical knowledge, forcing them to rely strictly on case-level evidence for clinical judgments. Existing benchmarks predominantly evaluate common-condition, single-image settings, leavin… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

  6. arXiv:2604.10724  [pdf, ps, other

    cs.CL

    Expect the Unexpected? Testing the Surprisal of Salient Entities

    Authors: Jessica Lin, Amir Zeldes

    Abstract: Previous work examining the Uniform Information Density (UID) hypothesis has shown that while information as measured by surprisal metrics is distributed more or less evenly across documents overall, local discrepancies can arise due to functional pressures corresponding to syntactic and discourse structural constraints. However, work thus far has largely disregarded the relative salience of disco… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted to ACL 2026 (main, long); camera-ready version

  7. arXiv:2604.10221  [pdf, ps, other

    cs.HC

    Building Regulation Capacity in Human-AI Collaborative Learning: A Human-Centred GenAI System

    Authors: Yujing Zhang, Jionghao Lin

    Abstract: Collaborative learning works when groups regulate together by setting shared goals, coordinating participation, monitoring progress, and responding to breakdowns through co-regulation (CoRL) and socially shared regulation (SSRL). As generative AI (GenAI) enters group work, however, it remains unclear whether and how it supports these socially distributed regulation processes. This doctoral project… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: 7 pages, 2 figures. Accepted at AIED 2026

  8. arXiv:2604.10107  [pdf

    cs.HC

    The Double-Edged Sword of Open-Ended Interaction: How LLM-Driven NPCs Affect Players' Cognitive Load and Gaming Experience

    Authors: Ting-Chen Hsu, Wenran Chen, Jiangxu Lin, Fei Qin, Zheyuan Zhang

    Abstract: This study examines how large language model-driven non-player characters (LLM-NPCs) affect players' cognitive load and gaming experience, with a particular focus on the underlying psychological mechanisms, differences across task scenarios, and the role of individual traits. Conducting a randomized between-subject experiment (N=130) in a self-developed game prototype "Campus Culture Week", we com… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

  9. arXiv:2604.09574  [pdf, ps, other

    cs.AI cs.LG

    Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

    Authors: Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu, Yong Yu, Weinan Zhang, Jianghao Lin

    Abstract: The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the ``Turing Test on Screen,'' formally modeling the interaction as a MinM… ▽ More

    Submitted 23 February, 2026; originally announced April 2026.

  10. arXiv:2604.09037  [pdf, ps, other

    cs.CV cs.CL cs.HC

    SiMing-Bench: Evaluating Procedural Correctness from Continuous Interactions in Clinical Skill Videos

    Authors: Xiyang Huang, Jiawei Lin, Keying Wu, Jiaxin Huang, Kailai Yang, Renxiong Wei, Cheng zeng, Jiayi Xiang, Ziyan Kuang, Min Peng, Qianqian Xie, Sophia Ananiadou

    Abstract: Current video benchmarks for multimodal large language models (MLLMs) focus on event recognition, temporal ordering, and long-context recall, but overlook a harder capability required for expert procedural judgment: tracking how ongoing interactions update the procedural state and thereby determine the correctness of later actions. We introduce SiMing-Bench, the first benchmark for evaluating this… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  11. arXiv:2604.09030  [pdf, ps, other

    cs.CV

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track 2)

    Authors: Lishen Qu, Yao Liu, Jie Liang, Hui Zeng, Wen Dai, Guanyi Qin, Ya-nan Guan, Shihao Zhou, Jufeng Yang, Lei Zhang, Radu Timofte, Xiyuan Yuan, Wanjie Sun, Shihang Li, Bo Zhang, Bin Chen, Jiannan Lin, Yuxu Chen, Qinquan Gao, Tong Tong, Song Gao, Jiacong Tang, Tao Hu, Xiaowen Ma, Qingsen Yan , et al. (10 additional authors not shown)

    Abstract: This paper presents NTIRE 2026, the 3rd Restore Any Image Model (RAIM) challenge on multi-exposure image fusion in dynamic scenes. We introduce a benchmark that targets a practical yet difficult HDR imaging setting, where exposure bracketing must be fused under scene motion, illumination variation, and handheld camera jitter. The challenge data contains 100 training sequences with 7 exposure level… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPRW 2026

  12. arXiv:2604.08410  [pdf, ps, other

    cs.CV cs.RO

    BLaDA: Bridging Language to Functional Dexterous Actions within 3DGS Fields

    Authors: Fan Yang, Wenrui Chen, Guorun Yan, Ruize Liao, Wanjun Jia, Dongsheng Luo, Jiacheng Lin, Kailun Yang, Zhiyong Li, Yaonan Wang

    Abstract: In unstructured environments, functional dexterous grasping calls for the tight integration of semantic understanding, precise 3D functional localization, and physically interpretable execution. Modular hierarchical methods are more controllable and interpretable than end-to-end VLA approaches, but existing ones still rely on predefined affordance labels and lack the tight semantic--pose coupling… ▽ More

    Submitted 14 April, 2026; v1 submitted 9 April, 2026; originally announced April 2026.

    Comments: Code will be publicly available at https://github.com/PopeyePxx/BLaDA

  13. arXiv:2604.08344  [pdf, ps, other

    cs.AI cs.HC

    Human-AI Collaboration Reconfigures Group Regulation from Socially Shared to Hybrid Co-Regulation

    Authors: Yujing Zhang, Xianghui Meng, Shihui Feng, Jionghao Lin

    Abstract: Generative AI (GenAI) is increasingly used in collaborative learning, yet its effects on how groups regulate collaboration remain unclear. Effective collaboration depends not only on what groups discuss, but on how they jointly manage goals, participation, strategy use, monitoring, and repair through co-regulation and socially shared regulation. We compared collaborative regulation between Human-A… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: 9 pages, 2 figures. Accepted at AIED 2026. Camera-ready version with updated references

  14. arXiv:2604.08342  [pdf, ps, other

    cs.LG

    EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment

    Authors: Qiance Tang, Ziqi Wang, Jieyu Lin, Ziyun Li, Barbara De Salvo, Sai Qian Zhang

    Abstract: Long context egocentric video understanding has recently attracted significant research attention, with augmented reality (AR) highlighted as one of its most important application domains. Nevertheless, the task remains highly challenging due to the need for reasoning over extended temporal contexts and diverse, unstructured activities. Although several benchmarks exist, most egocentric datasets r… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  15. arXiv:2604.08224  [pdf, ps, other

    cs.SE cs.MA

    Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    Authors: Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu, Weiming Zhang, Congming Zheng, Jiachen Zhu, Zeyu Zheng, Zhuosheng Zhang, Xingyu Lou, Changwang Zhang, Zhihui Fu, Jun Wang, Weiwen Liu, Jianghao Lin, Weinan Zhang

    Abstract: Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into memory stores, reusable skills, interaction protocols, and the surrounding harness that makes these modules reliable in practice. This paper reviews that shift throu… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: 54 pages, tech report on Externalization in LLM Agents

  16. arXiv:2604.08033  [pdf, ps, other

    cs.AI cs.MA cs.NI

    IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling

    Authors: Zhaomeng Zhou, Lan Zhang, Junyang Wang, Mu Yuan, Junda Lin, Jinke Song

    Abstract: Intelligent systems powered by large-scale sensor networks are shifting from predefined monitoring to intent-driven operation, revealing a critical Semantic-to-Physical Mapping Gap. While large language models (LLMs) excel at semantic understanding, existing perception-centric pipelines operate retrospectively, overlooking the fundamental decision of what to sense and when. We formalize this proac… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: To appear in ACM MobiCom 2026; 13 pages, 12 figures

  17. arXiv:2604.07469  [pdf, ps, other

    cs.HC

    Assessing the Impact and Underlying Pathways of Sequenced AI feedback on Student Learning

    Authors: Jie Cao, Chloe Qianhui Zhao, Christian Schunn, Elizabeth A. McLaughlin, Jionghao Lin, Kenneth R. Koedinger

    Abstract: Feedback is essential for learning, but its effectiveness relies heavily on how well it engages students in the educational process. Generative AI offers novel opportunities to efficiently produce rich, formative feedback, ranging from direct explanations to incrementally sequenced scaffolding designed to promote learner autonomy. Despite these capabilities, it is still unclear whether sequenced (… ▽ More

    Submitted 14 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

  18. arXiv:2604.07397  [pdf, ps, other

    cs.LG cs.AI

    Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training

    Authors: Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado

    Abstract: A key inefficiency in diffusion training occurs when a randomly initialized network, lacking visual priors, encounters gradients from the full complexity spectrum--most of which it lacks the capacity to resolve. We propose Data Warmup, a curriculum strategy that schedules training images from simple to complex without modifying the model or loss. Each image is scored offline by a semantic-aware co… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: CVPRW in the proceedings of CVPR 2026

  19. arXiv:2604.07258  [pdf, ps, other

    cs.LG

    A comparative analysis of machine learning models in SHAP analysis

    Authors: Justin Lin, Julia Fukuyama

    Abstract: In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex data patterns. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) anal… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 17 pages, 16 figures, 4 tables

  20. arXiv:2604.06562  [pdf, ps, other

    cs.AI

    On Emotion-Sensitive Decision Making of Small Language Model Agents

    Authors: Jiaju Lin, Xingjian Du, Qingyun Wu, Ellen Wenting Zou, Jindong Wang

    Abstract: Small language models (SLM) are increasingly used as interactive decision-making agents, yet most decision-oriented evaluations ignore emotion as a causal factor influencing behavior. We study emotion-sensitive decision making by combining representation-level emotion induction with a structured game-theoretic evaluation. Emotional states are induced using activation steering derived from crowd-va… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  21. arXiv:2604.06079  [pdf, ps, other

    cs.CV cs.AI

    Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

    Authors: Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu

    Abstract: Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While TikZ is the de facto standard for scientific schematics due to its programmatic flexibility, its requirement for rigorous spatial precision presents a significant challenge for Multimodal Large Language Models. Progress is… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  22. arXiv:2604.05828  [pdf, ps, other

    cs.RO

    Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

    Authors: Tianyue Wu, Guangtong Xu, Zihan Wang, Junxiao Lin, Tianyang Chen, Yuze Wu, Zhichao Han, Zhiyang Liu, Fei Gao

    Abstract: Precise aggressive maneuvers with lightweight onboard sensors remains a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems' accessible area by navigating through narrow openings in the environment. Among the most relevant problems, a representative one is aggressive traversal through narrow gaps with quadrotors under SE(3) constr… ▽ More

    Submitted 7 April, 2026; v1 submitted 7 April, 2026; originally announced April 2026.

    Comments: This manuscript was submitted in June 2025. The first revision was submitted in November 2025. The second revision was submitted in February 2026. The first two authors contributed equally to this work

  23. arXiv:2604.04166  [pdf, ps, other

    cs.RO

    Primitive-based Truncated Diffusion for Efficient Trajectory Generation of Differential Drive Mobile Manipulators

    Authors: Long Xu, Choilam Wong, Yuhang Zhong, Junxiao Lin, Jialiang Hou, Fei Gao

    Abstract: We present a learning-enhanced motion planner for differential drive mobile manipulators to improve efficiency, success rate, and optimality. For task representation encoder, we propose a keypoint sequence extraction module that maps boundary states to 3D space via differentiable forward kinematics. Point clouds and keypoints are encoded separately and fused with attention, enabling effective inte… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    Comments: 9 pages, 6 figures

  24. arXiv:2604.04036  [pdf, ps, other

    cs.IR cs.CL

    MisEdu-RAG: A Misconception-Aware Dual-Hypergraph RAG for Novice Math Teachers

    Authors: Zhihan Guo, Rundong Xue, Yuting Lu, Jionghao Lin

    Abstract: Novice math teachers often encounter students' mistakes that are difficult to diagnose and remediate. Misconceptions are especially challenging because teachers must explain what went wrong and how to solve them. Although many existing large language model (LLM) platforms can assist in generating instructional feedback, these LLMs loosely connect pedagogical knowledge and student mistakes, which m… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    ACM Class: I.2.4; H.3.3; K.3.1

  25. arXiv:2604.03660  [pdf, ps, other

    cs.AI

    TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

    Authors: Xiaoyu Chen, Lu Dai, Hanqing Wang, Zhuoyu Li, Wenbin Dai, Yanzong Zheng, Zhenggang Xia, Junyong Lin, Hui Xiong

    Abstract: Structured tables are essential for conveying high-density information in professional domains such as finance, healthcare, and scientific research. Despite the progress in Multimodal Large Language Models (MLLMs), reasoning performance remains limited for complex tables with hierarchical layouts. In this paper, we identify a critical Perception Bottleneck through quantitative analysis. We find th… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  26. arXiv:2604.03448  [pdf, ps, other

    cs.CV cs.AI cs.HC cs.LG

    ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop

    Authors: Kenan Tang, Jiasheng Guo, Jeffrey Lin, Yao Qin

    Abstract: Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introdu… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: Accepted to CVPR 2026 Workshop on Generative AI for Storytelling (AISTORY)

  27. arXiv:2604.02911  [pdf, ps, other

    cs.RO

    Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

    Authors: Junyang Liang, Yuxuan Liu, Yabin Chang, Junfan Lin, Junkai Ji, Hui Li, Changxin Huang, Jianqiang Li

    Abstract: Achieving quadruped robot locomotion across diverse and dynamic terrains presents significant challenges, primarily due to the discrepancies between simulation environments and real-world conditions. Traditional sim-to-real transfer methods often rely on manual feature design or costly real-world fine-tuning. To address these limitations, this paper proposes the DreamTIP framework, which incorpora… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2026

  28. arXiv:2604.02334  [pdf, ps, other

    cs.AI cs.MA

    Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

    Authors: Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, Zhi Han, Zimian Peng, Linyao Chen, Wenzheng Tom Tang, Zongkai Liu, Tao Zhou, Botao Amber Hu, Shuyang Tang, Jianghao Lin, Weiwen Liu, Muning Wen, Yuanjian Zhou, Weinan Zhang

    Abstract: As large language models (LLM)-driven agents transition from isolated task solvers to persistent digital entities, the emergence of the Agentic Web, an ecosystem where heterogeneous agents autonomously interact and co-evolve, marks a pivotal shift toward Artificial General Intelligence (AGI). However, LLM-based multi-agent systems (LaMAS) are hindered by open-world issues such as scaling friction,… ▽ More

    Submitted 18 January, 2026; originally announced April 2026.

    Comments: 38 pages, 8 figures, and 4 tables

  29. arXiv:2604.01881  [pdf, ps, other

    cs.CV cs.CL

    HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

    Authors: Yansong Guo, Chaoyang Zhu, Jiayi Ji, Jianghang Lin, Liujuan Cao

    Abstract: Video Large Language Models (VideoLLMs) have demonstrated impressive capabilities in video understanding, yet the massive number of input video tokens incurs a significant computational burden for deployment. Existing methods mainly prune video tokens at input level while neglecting the inherent information structure embedded in videos and large language models (LLMs). To address this, we propose… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  30. arXiv:2604.01542  [pdf, ps, other

    cs.CV physics.optics

    Universal computational thermal imaging overcoming the ghosting effect

    Authors: Hongyi Xu, Du Wang, Chenjun Zhao, Jiashuo Chen, Jiale Lin, Liqin Cao, Yanfei Zhong, Yiyuan She, Fanglin Bao

    Abstract: Thermal imaging is crucial for night vision but fundamentally hampered by the ghosting effect, a loss of detailed texture in cluttered photon streams. While conventional ghosting mitigation has relied on data post-processing, the recent breakthrough in heat-assisted detection and ranging (HADAR) opens a promising frontier for hyperspectral computational thermal imaging that produces night vision w… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 9 pages, 6 figures

  31. arXiv:2604.01195  [pdf, ps, other

    cs.CL cs.AI cs.IR

    ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget

    Authors: Nandan Thakur, Zijian Chen, Xueguang Ma, Jimmy Lin

    Abstract: Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queri… ▽ More

    Submitted 2 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

    Comments: Preprint

  32. arXiv:2604.00531  [pdf, ps, other

    cs.LG

    Learning Shared Representations for Multi-Task Linear Bandits

    Authors: Jiabin Lin, Shana Moothedath

    Abstract: Multi-task representation learning is an approach that learns shared latent representations across related tasks, facilitating knowledge transfer and improving sample efficiency. This paper introduces a novel approach to multi-task representation learning in linear bandits. We consider a setting with T concurrent linear bandit tasks, each with feature dimension d, that share a common latent repres… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  33. arXiv:2604.00142  [pdf, ps, other

    cs.HC

    Practice Less, Explain More: LLM-Supported Self-Explanation Improves Explanation Quality on Transfer Problems in Calculus

    Authors: Eason Chen, Xinyi Tang, Yvonne Zhao, Meiyi Chen, Meryam Elmir, Elizabeth McLaughlin, Mingyu Yuan, Yumo Wang, Shyam Agarwal, Jared Cochrane, Jionghao Lin, Tongshuang Wu, Ken Koedinger

    Abstract: We conducted a between-subjects experiment (N=92) comparing three conditions in a calculus learning environment: no self-explanation (control), menu-based self-explanation, and open-ended self-explanation with LLM-generated feedback. All conditions showed positive learning gains within a fixed 60-minute practice session, with no significant between-condition differences in post-test performance. O… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

    Comments: 9 pages, 2 figures. Accepted at AIED 2026. Camera-ready version with updated references

  34. arXiv:2603.30014  [pdf, ps, other

    cs.DC cs.AI

    Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing

    Authors: Derek Anderson, Amit Bashyal, Markus Diefenthaler, Cristiano Fanelli, Wen Guan, Tanja Horn, Alex Jentsch Meifeng Lin, Tadashi Maeno, Kei Nagai, Hemalata Nayak, Connor Pecar, Karthik Suresh, Fang-Ying Tsai, Anselm Vossen, Tianle Wang, Torre Wenaus

    Abstract: The Production and Distributed Analysis (PanDA) system, originally developed for the ATLAS experiment at the CERN Large Hadron Collider (LHC), has evolved into a robust platform for orchestrating large-scale workflows across distributed computing resources. Coupled with its intelligent Distributed Dispatch and Scheduling (iDDS) component, PanDA supports AI/ML-driven workflows through a scalable an… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  35. arXiv:2603.28116  [pdf, ps, other

    cs.RO cs.CV

    $AutoDrive\text{-}P^3$: Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning

    Authors: Yuqi Ye, Zijian Zhang, Junhong Lin, Shangkun Sun, Changhao Peng, Wei Gao

    Abstract: Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: Accepted at ICLR 2026 (International Conference on Learning Representations)

  36. arXiv:2603.27538  [pdf, ps, other

    cs.CV cs.CL

    LongCat-Next: Lexicalizing Modalities as Discrete Tokens

    Authors: Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li , et al. (64 additional authors not shown)

    Abstract: The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and suboptimal integration. To transcend this limitation, we introduce Discrete Native Aut… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: LongCat-Next Technical Report

  37. arXiv:2603.27460  [pdf, ps, other

    cs.CV cs.AI

    Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

    Authors: Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou , et al. (102 additional authors not shown)

    Abstract: Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: 157 pages, 19 figures, 26 tables. Project repo: \url{https://github.com/uni-medical/Project-Imaging-X}

  38. arXiv:2603.26142  [pdf, ps, other

    cs.HC

    Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models

    Authors: Jiajia Song, Zhihan Guo, Jionghao Lin

    Abstract: Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to… ▽ More

    Submitted 30 March, 2026; v1 submitted 27 March, 2026; originally announced March 2026.

  39. arXiv:2603.25340  [pdf, ps, other

    cs.CL

    Large Language Model as Token Compressor and Decompressor

    Authors: Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, Wei Yang

    Abstract: In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactl… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  40. arXiv:2603.25118  [pdf, ps, other

    cs.CV

    AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization

    Authors: Jiawei Lin, Wanrong Zhu, Vlad I Morariu, Christopher Tensmeyer

    Abstract: Document generation has gained growing attention in the field of AI-driven content creation. In this work, we push its boundaries by introducing AnyDoc, a framework capable of handling multiple generation tasks across a wide spectrum of document categories, all represented in a unified HTML/CSS format. To overcome the limited coverage and scale of existing human-crafted document datasets, AnyDoc f… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: CVPR 2026 Main Conference

  41. arXiv:2603.25040  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

    Authors: Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang , et al. (152 additional authors not shown)

    Abstract: We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis… ▽ More

    Submitted 2 April, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

  42. arXiv:2603.24278  [pdf, ps, other

    cs.CV

    TopoMesh: High-Fidelity Mesh Autoencoding via Topological Unification

    Authors: Guan Luo, Xiu Li, Rui Chen, Xuanyu Yi, Jing Lin, Chia-Hao Chen, Jiahang Liu, Song-Hai Zhang, Jianfeng Zhang

    Abstract: The dominant paradigm for high-fidelity 3D generation relies on a VAE-Diffusion pipeline, where the VAE's reconstruction capability sets a firm upper bound on generation quality. A fundamental challenge limiting existing VAEs is the representation mismatch between ground-truth meshes and network predictions: GT meshes have arbitrary, variable topology, while VAEs typically predict fixed-structure… ▽ More

    Submitted 26 March, 2026; v1 submitted 25 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026. Project page: https://logan0601.github.io/projects/topomesh/index.html

  43. arXiv:2603.23906  [pdf, ps, other

    cs.CV

    GenMask: Adapting DiT for Segmentation via Direct Mask Generation

    Authors: Yuhuan Yang, Xianwei Zhuang, Yuxuan Cai, Chaofan Ma, Shuai Bai, Jiangchao Yao, Ya Zhang, Junyang Lin, Yanfeng Wang

    Abstract: Recent approaches for segmentation have leveraged pretrained generative models as feature extractors, treating segmentation as a downstream adaptation task via indirect feature retrieval. This implicit use suffers from a fundamental misalignment in representation. It also depends heavily on indirect feature extraction pipelines, which complicate the workflow and limit adaptation. In this paper, we… ▽ More

    Submitted 26 March, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

    Comments: Accepted by cvpr 2026

  44. arXiv:2603.23478  [pdf, ps, other

    cs.CV

    UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

    Authors: Jiaying Lin, Dan Xu

    Abstract: Functionality segmentation in 3D scenes requires an agent to ground implicit natural-language instructions into precise masks of fine-grained interactive elements. Existing methods rely on fragmented pipelines that suffer from visual blindness during initial task parsing. We observe that these methods are limited by single-scale, passive and heuristic frame selection. We present UniFunc3D, a unifi… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  45. arXiv:2603.23231  [pdf, ps, other

    cs.AI

    PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

    Authors: Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu

    Abstract: Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events that drive the evolution of user preferences. Such settings overlook a fu… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  46. arXiv:2603.23085  [pdf, ps, other

    cs.AI

    MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

    Authors: Jianxin Lin, Chunzheng Zhu, Peter J. Kneuertz, Yunfei Bai, Yuan Xue

    Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reason… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  47. arXiv:2603.23005  [pdf, ps, other

    cs.CR

    Multi-User Multi-Key Image Steganography with Key Isolation

    Authors: Tzu-Ti Wei, Yu-Han Tseng, Jun-Yi Lin, Yu-Chee Tseng, Jen-Jee Chen

    Abstract: Steganography conceals secret information within innocuous carriers while preserving visual fidelity and enabling reliable recovery. Recent unified networks operate normally under untriggered conditions but switch to hidden steganographic tasks when triggered. PUSNet follows this paradigm by performing image purification during normal operation and steganographic embedding when activated. However,… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: 6 pages, 5 figures

    MSC Class: 68U10; 94A62 ACM Class: I.4.9; K.6.5

  48. arXiv:2603.21082  [pdf, ps, other

    cs.NI

    AnyPro: Preference-Preserving Anycast Optimization based on Strategic AS-Path Prepending

    Authors: Minyuan Zhou, Yuning Chen, Jiaqi Zheng, Yifei Xu, Pan Hu, Yongping Tang, Wendong Yin, Jie Lin, Qingyan Yu, Yuanchao Su, Guihai Chen, Wanchun Dou, Songwu Lu, Wan Du

    Abstract: Operating large-scale anycast networks is challenging because client-to-site mappings often misalign with operator's expectation due to opaque inter-domain routing. We present AnyPro, the first system to unlock the full potential of AS-path prepending (ASPP), efficiently deriving globally optimal configurations to steer clients toward performance-optimal sites at scale. AnyPro first employs an eff… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: NSDI 2026

  49. arXiv:2603.21019  [pdf, ps, other

    cs.CR cs.SE

    SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

    Authors: Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin, Yuanjian Zhou, Weinan Zhang

    Abstract: With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for augmenting agent capabilities. However, these marketplaces face unprecedented security challenges, primarily stemming from semantic-behavioral inconsistency and inter-skill combinatorial risks, where individually benign skills induce malicious behaviors… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 16 pages, 6 figures

  50. arXiv:2603.20185  [pdf, ps, other

    cs.CV cs.AI cs.CL

    VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

    Authors: Jingyang Lin, Jialian Wu, Jiang Liu, Ximeng Sun, Ze Wang, Xiaodong Yu, Jiebo Luo, Zicheng Liu, Emad Barsoum

    Abstract: Video agentic models have advanced challenging video-language tasks. However, most agentic approaches still heavily rely on greedy parsing over densely sampled video frames, resulting in high computational cost. We present VideoSeek, a long-horizon video agent that leverages video logic flow to actively seek answer-critical evidence instead of exhaustively parsing the full video. This insight allo… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: Accepted at CVPR 2026