Skip to main content

Showing 1–50 of 1,640 results for author: Feng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.13611  [pdf, ps, other

    cs.SE

    V2E: Validating Smart Contract Vulnerabilities through Profit-driven Exploit Generation and Execution

    Authors: Jingwen Zhang, Yuhong Nan, Kaiwen Ning, Mingxi Ye, Wei Li, Yuming Xiao, Yuming Feng, Weizhe Zhang, Zibin Zheng

    Abstract: Smart contracts are a critical component of blockchain systems. Due to the large amount of digital assets carried by smart contracts, their security is of critical importance. Although numerous tools have been developed for detecting smart contract vulnerability, their effectiveness remains limited, particularly due to the high false positives included in the reported results. Therefore, developer… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: Accepted by FSE 2026

  2. arXiv:2604.13244  [pdf, ps, other

    cs.CV cs.AI cs.RO

    4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview

    Authors: Benjamin Kiefer, Jan Lukas Augustin, Jon Muhovič, Mingi Jeong, Arnold Wiliem, Janez Pers, Matej Kristan, Alberto Quattrini Li, Matija Teršek, Josip Šarić, Arpita Vats, Dominik Hildebrand, Rafia Rahim, Mahmut Karaaslan, Arpit Vaishya, Steve Xie, Ersin Kaya, Akib Mashrur, Tze-Hsiang Tang, Chun-Ming Tsai, Jun-Wei Hsieh, Ming-Ching Chang, Wonwoo Jo, Doyeon Lee, Yusi Cao , et al. (30 additional authors not shown)

    Abstract: The 4th Workshop on Maritime Computer Vision (MaCVi) is organized as part of CVPR 2026. This edition features five benchmark challenges with emphasis on both predictive accuracy and embedded real-time feasibility. This report summarizes the MaCVi 2026 challenge setup, evaluation protocols, datasets, and benchmark tracks, and presents quantitative results, qualitative comparisons, and cross-challen… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: Accepted to CVPR 2026 Workshop Proceeding; Maritime Computer Vision Workshop

  3. arXiv:2604.13060  [pdf, ps, other

    cs.CL cs.LG cs.MM

    Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage

    Authors: Ziyi He, Yushi Feng, Shuangyu Yang, Yinghao Zhu, Xichen Zhang, Pak Chuen Patrick Tai, Hei Yuet Lo, Songying Wu, Weifa Yang, Lequan Yu

    Abstract: Dental triage is a safety-critical clinical routing task that requires integrating multimodal clinical information (e.g., patient complaints and radiographic evidence) to determine complete referral plans. We present Dental-TriageBench, the first expert-annotated benchmark for reasoning-driven multimodal dental triage. Built from authentic outpatient workflows, it contains 246 de-identified cases… ▽ More

    Submitted 18 March, 2026; originally announced April 2026.

  4. arXiv:2604.12618  [pdf, ps, other

    cs.AR

    CODO: An Automated Compiler for Comprehensive Dataflow Optimization

    Authors: Weichuang Zhang, Yiquan Wang, Xinzhou Zhang, Chi Zhang, Yu Feng, Xiaofeng Hou, Chao Li, Jieru Zhao, Minyi Guo

    Abstract: FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient dataflow architecture for large-scale applications is still challenging, even for specialists who use high-level synthesis (HLS) to simplify FPGA programming. T… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: Accepted by ISCA 2026

  5. arXiv:2604.12508  [pdf, ps, other

    cs.CV

    From Attenuation to Attention: Variational Information Flow Manipulation for Fine-Grained Visual Perception

    Authors: Jilong Zhu, Yang Feng

    Abstract: While Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in general visual understanding, they frequently falter in fine-grained perception tasks that require identifying tiny objects or discerning subtle visual relationships. We attribute this limitation to Visual Attenuation: a phenomenon where sparse fine-grained visual signals are prematurely suppressed or dilut… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  6. arXiv:2604.12247  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration

    Authors: Zhuofan Wen, Yang Feng

    Abstract: Speculative decoding has emerged as a promising approach to accelerate autoregressive inference in large language models (LLMs). Self-draft methods, which leverage the base LLM itself for speculation, avoid the overhead of auxiliary draft models but face limitations: shallow layers often produce overconfident yet incorrect token predictions, and the presence of difficult tokens in a draft sequence… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: ACL 2026 Findings

  7. arXiv:2604.11096  [pdf, ps, other

    cs.CL cs.AI cs.SD

    Efficient Training for Cross-lingual Speech Language Models

    Authors: Yan Zhou, Qingkai Fang, Yun Hong, Yang Feng

    Abstract: Currently, large language models (LLMs) predominantly focus on the text modality. To enable more natural human-AI interaction, speech LLMs are emerging, but building effective end-to-end speech LLMs remains challenging due to limited data and the difficulty in expanding to more languages. In this paper, we introduce Cross-lingual Speech Language Model (CSLM), an efficient training method for cross… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Accepted to Findings of ACL 2026

  8. arXiv:2604.10470  [pdf, ps, other

    cs.CL cs.AI

    From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation

    Authors: Mingfei Lu, Yi Zhang, Mengjia Wu, Yue Feng

    Abstract: Legal consultation question answering (Legal CQA) presents unique challenges compared to traditional legal QA tasks, including the scarcity of high-quality training data, complex task composition, and strong contextual dependencies. To address these, we construct JurisCQAD, a large-scale dataset of over 43,000 real-world Chinese legal queries annotated with expert-validated positive and negative r… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted by ACL 2026 Main conference

  9. arXiv:2604.10095  [pdf, ps, other

    cs.CV

    Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models

    Authors: Yu Jiang, Hanwen Jiang, Ahmed Abdelkader, Wen-Sheng Chu, Brandon Y. Feng, Zhangyang Wang, Qixing Huang

    Abstract: With the emergence of 3D foundation models, there is growing interest in fine-tuning them for downstream tasks, where LoRA is the dominant fine-tuning paradigm. As 3D datasets exhibit distinct variations in texture, geometry, camera motion, and lighting, there are interesting fundamental questions: 1) Are there LoRA subspaces associated with each type of variation? 2) Are these subspaces disentang… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: 10 pages, 8 figures

  10. arXiv:2604.09587  [pdf, ps, other

    cs.AI cs.LG cs.SE

    MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

    Authors: Yunfei Feng, Xi Zhao, Cheng Zhang, Dahu Feng, Daolin Cheng, Jianqi Yu, Yubin Xia, Erhu Feng

    Abstract: Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by connecting to a system-level Android emulator and provide evaluation signals based on the state of system resources. In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to… ▽ More

    Submitted 28 February, 2026; originally announced April 2026.

  11. arXiv:2604.09568  [pdf, ps, other

    cs.HC cs.CL cs.CV

    EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution

    Authors: Tianfu Wang, Leilei Ding, Ziyang Tao, Yi Zhan, Zhiyuan Ma, Wei Wu, Yuxuan Lei, Yuan Feng, Junyang Wang, Yin Wu, Yizhao Xu, Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Yanyong Zhang, Hui Xiong

    Abstract: High-fidelity diagram creation requires the complex orchestration of semantic topology, visual styling, and spatial layout, posing a significant challenge for automated systems. Existing methods also suffer from a representation gap: pixel-based models often lack precise control, while code-based synthesis limits intuitive flexibility. To bridge this gap, we introduce EvoDiagram, an agentic framew… ▽ More

    Submitted 20 February, 2026; originally announced April 2026.

  12. arXiv:2604.09155  [pdf, ps, other

    cs.LG cs.AI

    CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

    Authors: Yushi Feng, Junye Du, Qifan Wang, Zizhan Ma, Qian Niu, Yutaka Matsuo, Long Feng, Lequan Yu

    Abstract: Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and user-tunable guarantees… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  13. arXiv:2604.08787  [pdf

    cs.RO

    One Interface, Many Robots: Unified Real-Time Low-Level Motion Planning for Collaborative Arms

    Authors: Yue Feng, Weicheng Huang, I-Ming Chen

    Abstract: This paper proposes a common interface for real-time low-level motion planning of collaborative robotic arms, aimed at enabling broader applicability and improved portability across heterogeneous hardware platforms. In previous work, we introduced WinGs Operating Studio (WOS), a middleware solution that abstracts diverse robotic components into uniform software resources and provides a broad suite… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    MSC Class: 70B15; 70Q05; 93C85 ACM Class: I.2.9; I.2.10; J.2

  14. arXiv:2604.08407  [pdf, ps, other

    cs.CR

    Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

    Authors: Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng

    Abstract: Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack sur… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  15. arXiv:2604.06956  [pdf, ps, other

    cs.DC cs.LG

    NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining

    Authors: Zhida Jiang, Zhaolong Xing, Huichao Chai, Tianxing Sun, Qiang Peng, Baopeng Yuan, Jiaxing Wang, Hua Du, Zhixin Wu, Xuemiao Li, Yikui Cao, Xinyu Liu, Yongxiang Feng, Zhen Chen, Ke Zhang

    Abstract: Modern recommendation models have increased to trillions of parameters. As cluster scales expand to O(1k), distributed training bottlenecks shift from computation and memory to data movement, especially lookup and communication latency associated with embeddings. Existing solutions either optimize only one bottleneck or improve throughput by sacrificing training consistency. This paper presents Ne… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  16. arXiv:2604.06863  [pdf, ps, other

    cs.SI cs.AI cs.CL cs.HC

    Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings

    Authors: Mingchen Li, Wajdi Aljedaani, Yingjie Liu, Navyasri Meka, Xuan Lu, Xinyue Ye, Junhua Ding, Yunhe Feng

    Abstract: Skin-toned emojis are crucial for fostering personal identity and social inclusion in online communication. As AI models, particularly Large Language Models (LLMs), increasingly mediate interactions on web platforms, the risk that these systems perpetuate societal biases through their representation of such symbols is a significant concern. This paper presents the first large-scale comparative stu… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: Accepted at WWW'26

    ACM Class: I.2.7; H.0; J.4

  17. arXiv:2604.06811  [pdf, ps, other

    cs.CR cs.AI

    SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

    Authors: Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, Wenke Huang

    Abstract: Skill-based agent systems tackle complex tasks by composing reusable skills, improving modularity and scalability while introducing a largely unexamined security attack surface. We propose SkillTrojan, a backdoor attack that targets skill implementations rather than model parameters or training data. SkillTrojan embeds malicious logic inside otherwise plausible skills and leverages standard skill… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  18. arXiv:2604.06150  [pdf

    cs.RO

    Delta6: A Low-Cost, 6-DOF Force-Sensing Flexible End-Effector

    Authors: Yue Feng, Weicheng Huang, Chen Qiu, Huixu Dong, I-Ming Chen

    Abstract: This paper presents Delta6, a low-cost, six-degree-of-freedom (6-DOF) force/torque end-effector that combines antagonistic springs with magnetic encoders to deliver accurate wrench sensing while remaining as simple to assemble as flat-pack furniture. A fully 3D-printed prototype, assembled entirely from off-the-shelf parts, withstands peak forces above +/-14.4 N and torques of +/-0.33 N.m per axis… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: This work has been submitted to the IEEE for possible publication

    MSC Class: 70B15; 93C85; 74M25 ACM Class: I.2.9; B.7.1; B.8.2

  19. arXiv:2604.04656  [pdf, ps, other

    cs.DS cs.CC

    Subset Balancing and Generalized Subset Sum via Lattices

    Authors: Yiming Gao, Yansong Feng, Honggang Hu, Yanbin Pan

    Abstract: We study the \emph{Subset Balancing} problem: given $\mathbf{x} \in \mathbb{Z}^n$ and a coefficient set $C \subseteq \mathbb{Z}$, find a nonzero vector $\mathbf{c} \in C^n$ such that $\mathbf{c}\cdot\mathbf{x} = 0$. The standard meet-in-the-middle algorithm runs in time $\tilde{O}(|C|^{n/2})=\tilde{O}(2^{n\log |C|/2})$, and recent improvements (SODA~2022, Chen, Jin, Randolph, and Servedio; STOC~20… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  20. arXiv:2604.04530  [pdf, ps, other

    cs.IR cs.LG

    SLSREC: Self-Supervised Contrastive Learning for Adaptive Fusion of Long- and Short-Term User Interests

    Authors: Wei Zhou, Yue Shen, Junkai Ji, Yinglan Feng, Xing Tang, Xiuqiang He, Liang Feng, Zexuan Zhu

    Abstract: User interests typically encompass both long-term preferences and short-term intentions, reflecting the dynamic nature of user behaviors across different timeframes. The uneven temporal distribution of user interactions highlights the evolving patterns of interests, making it challenging to accurately capture shifts in interests using comprehensive historical behaviors. To address this, we propose… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  21. arXiv:2604.03998  [pdf, ps, other

    cs.RO

    VA-FastNavi-MARL: Real-Time Robot Control with Multimedia-Driven Meta-Reinforcement Learning

    Authors: Yang Zhang, Shengxi Jing, Fengxiang Wang, Yuan Feng, Hong Wang

    Abstract: Interpreting dynamic, heterogeneous multimedia commands with real-time responsiveness is critical for Human-Robot Interaction. We present VA-FastNavi-MARL, a framework that aligns asynchronous audio-visual inputs into a unified latent representation. By treating diverse instructions as a distribution of navigable goals via Meta-Reinforcement Learning, our method enables rapid adaptation to unseen… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    Comments: Accepted to the 2026 IEEE International Conference on Multimedia and Expo (ICME 2026)

    Journal ref: 2026 IEEE International Conference on Multimedia and Expo (ICME)

  22. arXiv:2604.02947  [pdf, ps, other

    cs.AI

    AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

    Authors: Yunhao Feng, Yifan Ding, Yingshui Tan, Xingjun Ma, Yige Li, Yutao Wu, Yifeng Gao, Kun Zhai, Yanming Guo

    Abstract: Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. This creates a distinct safety challenge in that harmful behavior may emerge through sequences of individually plausible steps, including intermediat… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  23. arXiv:2604.02923  [pdf, ps, other

    cs.CL cs.AI

    Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus

    Authors: Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang

    Abstract: Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content -- and exhibit systematic biases that are amplified by uneven expert activation during inference.… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 13 pages, 8 figures, technical report

  24. arXiv:2604.00945  [pdf, ps, other

    cs.CY

    A Visionary Look at Vibe Researching

    Authors: Yebo Feng, Yang Liu

    Abstract: Vibe researching is an emerging paradigm in which human researchers provide high-level direction and critical judgment while LLM-based agents handle the labor-intensive execution of literature review, experimentation, data analysis, and manuscript drafting. Inspired by the "vibe coding" movement in software engineering, it occupies a middle ground between traditional manual research and fully auto… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  25. arXiv:2604.00715  [pdf, ps, other

    cs.CL cs.AI cs.LG

    To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

    Authors: Karan Singh, Michael Yu, Varun Gangal, Zhuofu Tao, Sachin Kumar, Emmy Liu, Steven Y. Feng

    Abstract: Retrieval-augmented generation (RAG) improves language model (LM) performance by providing relevant context at test time for knowledge-intensive situations. However, the relationship between parametric knowledge acquired during pretraining and non-parametric knowledge accessed via retrieval remains poorly understood, especially under fixed data budgets. In this work, we systematically study the tr… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: Code and data at https://github.com/DegenAI-Labs/RAG-scaling-laws

  26. arXiv:2603.29966  [pdf, ps, other

    cs.CV

    Scaling Video Pretraining for Surgical Foundation Models

    Authors: Sicheng Lu, Zikai Xiao, Jianhui Wei, Danyu Sun, Qi Lu, Keli Hu, Yang Feng, Jian Wu, Zongxin Yang, Zuozhu Liu

    Abstract: Surgical video understanding is essential for computer-assisted interventions, yet existing surgical foundation models remain constrained by limited data scale, procedural diversity, and inconsistent evaluation, often lacking a reproducible training pipeline. We propose SurgRec, a scalable and reproducible pretraining recipe for surgical video understanding, instantiated with two variants: SurgRec… ▽ More

    Submitted 2 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

  27. arXiv:2603.29705  [pdf, ps, other

    cs.IR

    Drift-Aware Continual Tokenization for Generative Recommendation

    Authors: Yuebo Feng, Jiahao Liu, Mingzhe Han, Dongsheng Li, Hansu Gu, Peng Zhang, Tun Lu, Ning Gu

    Abstract: Generative recommendation commonly adopts a two-stage pipeline in which a learnable tokenizer maps items to discrete token sequences (i.e. identifiers) and an autoregressive generative recommender model (GRM) performs prediction based on these identifiers. Recent tokenizers further incorporate collaborative signals so that items with similar user-behavior patterns receive similar codes, substantia… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  28. arXiv:2603.29552  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models

    Authors: Linda Zeng, Steven Y. Feng, Michael C. Frank

    Abstract: Multilingualism is incredibly common around the world, leading to many important theoretical and practical questions about how children learn multiple languages at once. For example, does multilingual acquisition lead to delays in learning? Are there better and worse ways to structure multilingual input? Many correlational studies address these questions, but it is surprisingly difficult to get de… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: Code and data at https://github.com/styfeng/bilingual-babyLM

  29. arXiv:2603.29522  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Baby Scale: Investigating Models Trained on Individual Children's Language Input

    Authors: Steven Y. Feng, Alvin W. M. Tan, Michael C. Frank

    Abstract: Modern language models (LMs) must be trained on many orders of magnitude more words of training data than human children receive before they begin to produce useful behavior. Assessing the nature and origins of this "data gap" requires benchmarking LMs on human-scale datasets to understand how linguistic knowledge emerges from children's natural training data. Using transcripts from the BabyView d… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: Code and data at https://github.com/styfeng/babyscale-LM

  30. arXiv:2603.29122  [pdf, ps, other

    cs.SE

    Logging Like Humans for LLMs: Rethinking Logging via Execution and Runtime Feedback

    Authors: Xin Wang, Yang Feng, Jiaoxiao Qian, Yang Zhang, Zhenhao Li, Zishuo Ding

    Abstract: Logging statements are essential for software debugging and maintenance. However, existing approaches to automatic logging generation rely on static analysis and produce statements in a single pass without considering runtime behavior. They are also typically evaluated by similarity to developer-written logs, assuming these logs form an adequate gold standard. This assumption is increasingly limit… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  31. arXiv:2603.28686  [pdf, ps, other

    cs.SE

    C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

    Authors: Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu

    Abstract: The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and practical adoption. Moreover, manual post-processing of such outputs is labor-inte… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  32. arXiv:2603.27991  [pdf, ps, other

    cs.HC cs.AI

    ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

    Authors: Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Jiale Lao, Yue Cheng, Wei Chen

    Abstract: Interactive documents help readers engage with complex ideas through dynamic visualization, interactive animations, and exploratory interfaces. However, creating such documents remains costly, as it requires both domain expertise and web development skills. Recent Large Language Model (LLM)-based agents can automate content creation, but directly applying them to interactive document generation of… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

  33. arXiv:2603.27538  [pdf, ps, other

    cs.CV cs.CL

    LongCat-Next: Lexicalizing Modalities as Discrete Tokens

    Authors: Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li , et al. (64 additional authors not shown)

    Abstract: The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and suboptimal integration. To transcend this limitation, we introduce Discrete Native Aut… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: LongCat-Next Technical Report

  34. arXiv:2603.27527  [pdf, ps, other

    cs.LG

    Visualization of Machine Learning Models through Their Spatial and Temporal Listeners

    Authors: Siyu Wu, Lei Shi, Lei Xia, Cenyang Wu, Zipeng Liu, Yingchaojie Feng, Liang Zhou, Wei Chen

    Abstract: Model visualization (ModelVis) has emerged as a major research direction, yet existing taxonomies are largely organized by data or tasks, making it difficult to treat models as first-class analysis objects. We present a model-centric two-stage framework that employs abstract listeners to capture spatial and temporal model behaviors, and then connects the translated model behavior data to the class… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

  35. arXiv:2603.27238  [pdf, ps, other

    cs.CV

    An Instance-Centric Panoptic Occupancy Prediction Benchmark for Autonomous Driving

    Authors: Yi Feng, Junwu E, Zizhan Guo, Yu Ma, Hanli Wang, Rui Fan

    Abstract: Panoptic occupancy prediction aims to jointly infer voxel-wise semantics and instance identities within a unified 3D scene representation. Nevertheless, progress in this field remains constrained by the absence of high-quality 3D mesh resources, instance-level annotations, and physically consistent occupancy datasets. Existing benchmarks typically provide incomplete and low-resolution geometry wit… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026. Code and dataset are available at https://mias.group/CarlaOcc

  36. arXiv:2603.27179  [pdf, ps, other

    cs.CV

    Reasoning-Driven Anomaly Detection and Localization with Image-Level Supervision

    Authors: Yizhou Jin, Yuezhu Feng, Jinjin Zhang, Peng Wang, Qingjie Liu, Yunhong Wang

    Abstract: Multimodal large language models (MLLMs) have recently demonstrated remarkable reasoning and perceptual abilities for anomaly detection. However, most approaches remain confined to image-level anomaly detection and textual reasoning, while pixel-level localization still relies on external vision modules and dense annotations. In this work, we activate the intrinsic reasoning potential of MLLMs to… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026

  37. arXiv:2603.27156  [pdf, ps, other

    cs.LG cs.AI

    GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph

    Authors: Yuebo Luo, Shiyang Li, Yifei Feng, Vishal Kancharla, Shaoyi Huang, Caiwen Ding

    Abstract: Graph Neural Networks (GNNs) show strong promise for circuit analysis, but scaling to modern large-scale circuit graphs is limited by GPU memory and training cost, especially for deep models. We revisit deep GNNs for circuit graphs and show that, when trainable, they significantly outperform shallow architectures, motivating an efficient, domain-specific training framework. We propose Grouped-Spar… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: 8 pages including references, already been accepted to DAC 2026

  38. arXiv:2603.26737  [pdf, ps, other

    cs.CV cs.AI

    Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning

    Authors: Guangfu Guo, Xiaoqian Lu, Yue Feng, Mingming Sun

    Abstract: Current multimodal LLMs encode images as static visual prefixes and rely on text-based reasoning, lacking goal-driven and adaptive visual access. Inspired by human visual perception-where attention is selectively and sequentially shifted from the most informative regions to secondary cues-we propose Structural Sequential Visual CoT SSV-CoT. First, a question-relevant saliency map identifies and or… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

  39. arXiv:2603.25681  [pdf, ps, other

    cs.CL

    Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

    Authors: Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou

    Abstract: As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. At the same time, the growing ability of models to make autonomous decisions and… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  40. arXiv:2603.25322  [pdf, ps, other

    cs.MA cs.AI

    AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study

    Authors: Wenlong Hou, Sheng Bi, Guangqian Yang, Lihao Liu, Ye Du, Hanxiao Xue, Juncheng Wang, Yuxiang Feng, Yue Xun, Nanxi Yu, Ning Mao, Mo Yang, Yi Wah Eva Cheung, Ling Long, Kay Chen Tan, Lequan Yu, Xiaomeng Ma, Shaozhen Yan, Shujun Wang

    Abstract: Alzheimer's disease (AD) is a growing global health challenge as populations age, and timely, accurate diagnosis is essential to reduce individual and societal burden. However, real-world AD assessment is hampered by incomplete, heterogeneous multimodal data and variability across sites and patient demographics. Although large language models (LLMs) have shown promise in biomedicine, their use in… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  41. arXiv:2603.24078  [pdf, ps, other

    cs.CV

    PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

    Authors: Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou

    Abstract: We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-annotation instances and 822 generation prompts spanning real, professional, and synthetic cases. To bridge visual design cognition and generative modeling, we define tasks for layout parsing, text-image… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

    Comments: CVPR 2026, Project Page: https://github.com/ArtmeScienceLab/PosterIQ-Benchmark

  42. arXiv:2603.23610  [pdf, ps, other

    cs.AI

    Environment Maps: Structured Environmental Representations for Long-Horizon Agents

    Authors: Yenchia Feng, Chirag Sharma, Karime Maamari

    Abstract: Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in hallucinations or trial-and-error. This paper introduces… ▽ More

    Submitted 26 March, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

    Comments: 9 pages, 5 figures, accepted to ICLR 2026 the 2nd Workshop on World Models; updated formatting issue

  43. arXiv:2603.22724  [pdf, ps, other

    cs.LG math.AP

    Double Coupling Architecture and Training Method for Optimization Problems of Differential Algebraic Equations with Parameters

    Authors: Wenqiang Yang, Wenyuan Wu, Yong Feng, Changbo Chen

    Abstract: Simulation and modeling are essential in product development, integrated into the design and manufacturing process to enhance efficiency and quality. They are typically represented as complex nonlinear differential algebraic equations. The growing diversity of product requirements demands multi-task optimization, a key challenge in simulation modeling research. A dual physics-informed neural netwo… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: 19pages, 11 figures

  44. arXiv:2603.20475  [pdf, ps, other

    cs.CV

    CREG: Compass Relational Evidence Graph for Characterizing Directional Structure in VLM Spatial-Reasoning Attribution

    Authors: Kaizhen Tan, Yang Feng, Heqing Du

    Abstract: Standard attribution heatmaps show where a vision-language model (VLM) focuses, but they do not reveal whether the recovered evidence is organized by the queried spatial relation or merely reflects image layout. To address this problem, we introduce CREG (Compass Relational Evidence Graph), a training-free diagnostic framework that converts token-level attribution into a reference-centered compass… ▽ More

    Submitted 13 April, 2026; v1 submitted 20 March, 2026; originally announced March 2026.

  45. arXiv:2603.20100  [pdf, ps, other

    cs.CL cs.AI

    An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models

    Authors: Yuming Feng, Christy Yang

    Abstract: Direct Preference Optimization (DPO) is widely used after supervised fine-tuning (SFT) to align language models, yet empirical behavior under small backbones and modest data is under-specified. We systematically compare SFT-only, DPO-only, and staged SFT-to-DPO training alongside full fine-tuning (FFT) versus LoRA on a GPT-2-scale decoder, evaluating paraphrase detection and Shakespearean sonnet c… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  46. arXiv:2603.19812  [pdf, ps, other

    cs.LG

    Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study

    Authors: Danya Li, Yan Feng, Rico Krueger

    Abstract: The integration of Automated Shuttles into shared urban spaces presents unique challenges due to the absence of traffic rules and the complex pedestrian interactions. Accurately anticipating pedestrian behavior in such unstructured environments is therefore critical for ensuring both safety and efficiency. This paper presents a Virtual Reality (VR) study that captures how pedestrians interact with… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  47. arXiv:2603.18891  [pdf, ps, other

    cs.CV cs.LG

    PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment

    Authors: Tianci Luo, Jinpeng Wang, Shiyu Qin, Niu Lian, Yan Feng, Bin Chen, Chun Yuan, Shu-Tao Xia

    Abstract: Visual In-Context Learning (VICL) aims to complete vision tasks by imitating pixel demonstrations. Recent work pioneered prompt fusion that combines the advantages of various demonstrations, which shows a promising way to extend VICL. Unfortunately, the patch-wise fusion framework and model-agnostic supervision hinder the exploitation of informative cues, thereby limiting performance gains. To ove… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Accepted to ICLR 2026. 17 pages, 11 figures, and 9 tables

  48. arXiv:2603.18091  [pdf, ps, other

    cs.CV cs.RO

    Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

    Authors: Chen Zhao, Zhuoran Wang, Haoyang Li, Shifeng Bao, Guanlin Li, Youhe Feng, Yang Li, Jie Tang, Jing Zhang

    Abstract: Vision-Language-Action (VLA) models have recently demonstrated strong performance across embodied tasks. Modern VLAs commonly employ diffusion action experts to efficiently generate high-precision continuous action chunks, while auto-regressive generation can be slower and less accurate at low-level control. Yet auto-regressive paradigms still provide complementary priors that can improve robustne… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  49. arXiv:2603.17826  [pdf, ps, other

    cs.SE cs.AI

    FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair

    Authors: Ruize Ma, Yilei Jiang, Shilin Zhang, Zheng Ma, Yi Feng, Vincent Ng, Zhi Wang, Xiangyu Yue, Chuanyi Li, Lewei Lu

    Abstract: Multimodal Automated Program Repair (MAPR) extends traditional program repair by requiring models to jointly reason over source code, textual issue descriptions, and visual artifacts such as GUI screenshots. While recent LLM-based repair systems have shown promising results, existing approaches face several limitations: rigid workflow pipelines restrict exploration during debugging, visual reasoni… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  50. arXiv:2603.17512  [pdf, ps, other

    cs.CL

    Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

    Authors: Mengyu Bu, Yang Feng

    Abstract: Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multil… ▽ More

    Submitted 6 April, 2026; v1 submitted 18 March, 2026; originally announced March 2026.

    Comments: ACL 2026 Main Conference. The code is available at https://github.com/ictnlp/XBridge