Skip to main content

Showing 1–50 of 959 results for author: Shen, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.12110  [pdf, ps, other

    cs.LG

    SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

    Authors: Zikun Liu, Liang Luo, Qianru Li, Zhengyu Zhang, Wei Ling, Jingyi Shen, Zeliang Chen, Yaning Huang, Jingxian Huang, Abdallah Aboelela, Chonglin Sun, Feifan Gu, Fenggang Wu, Hang Qu, Huayu Li, Jill Pan, Kaidi Pei, Laming Chen, Longhao Jin, Qin Huang, Tongyi Tang, Varna Puvvada, Wenlin Chen, Xiaohan Wei, Xu Cao , et al. (8 additional authors not shown)

    Abstract: Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Lat… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Accepted to SIGIR 2026 Industry Track

  2. arXiv:2604.11983  [pdf, ps, other

    cs.NI cs.LG

    A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction

    Authors: Jingzhou Shen, Luis Lago Enamorado, Shiwen Mao, Xuyu Wang

    Abstract: In this paper, we propose the geometric algebra-informed neural radiance fields (GAI-NeRF), a novel framework for wireless channel prediction that leverages geometric algebra attention mechanisms to capture ray-object interactions in complex propagation environments. Our approach incorporates global token representations, drawing inspiration from transformer architectures in language and vision do… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: It is accepted by IEEE INFOCOM 2026

  3. arXiv:2604.10321  [pdf, ps, other

    cs.CV

    NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

    Authors: Jie Cai, Kangning Yang, Zhiyuan Li, Florin-Alexandru Vasluianu, Radu Timofte, Jinlong Li, Jinglin Shen, Zibo Meng, Junyan Cao, Lu Zhao, Pengwei Liu, Yuyi Zhang, Fengjun Guo, Jiagao Hu, Zepeng Wang, Fei Wang, Daiguo Zhou, Yi'ang Chen, Honghui Zhu, Mengru Yang, Yan Luo, Kui Jiang, Jin Guo, Jonghyuk Park, Jae-Young Sim , et al. (28 additional authors not shown)

    Abstract: In this paper, we review the NTIRE 2026 challenge on single-image reflection removal (SIRR) in the Wild. SIRR is a fundamental task in image restoration. Despite progress in academic research, most methods are tested on synthetic images or limited real-world images, creating a gap in real-world applications. In this challenge, we provide participants with the OpenRR-5k dataset, which requires them… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

  4. arXiv:2604.10290  [pdf, ps, other

    cs.AI

    AI Organizations are More Effective but Less Aligned than Individual Agents

    Authors: Judy Hanwen Shen, Daniel Zhu, Siddarth Srinivasan, Henry Sleight, Lawrence T. Wagner III, Morgan Jane Matthews, Erik Jones, Jascha Sohl-Dickstein

    Abstract: AI is increasingly deployed in multi-agent systems; however, most research considers only the behavior of individual models. We experimentally show that multi-agent "AI organizations" are simultaneously more effective at achieving business goals, but less aligned, than individual AI agents. We examine 12 tasks across two practical settings: an AI consultancy providing solutions to business problem… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: ICLR Workshop Version

  5. arXiv:2604.09083  [pdf, ps, other

    cs.OS cs.DC

    EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

    Authors: Yongsheng Yan, Jiacheng Shen, Xuchuan Luo, Yangfan Zhou

    Abstract: Deploying large language models (LLMs) on mobile devices is an emerging trend to enable data privacy and offline accessibility of LLM applications. Modern mobile neural processing units (NPUs) make such deployment increasingly feasible. However, existing mobile LLM inference frameworks suffer from high start-up latency due to their inevitable cold starts, i.e., launching LLM inferences when the mo… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  6. arXiv:2604.07422  [pdf, ps, other

    cs.LG

    Multimodal Large Language Models for Multi-Subject In-Context Image Generation

    Authors: Yucheng Zhou, Dubing Chen, Huan Zheng, Jianbing Shen

    Abstract: Recent advances in text-to-image (T2I) generation have enabled visually coherent image synthesis from descriptions, but generating images containing multiple given subjects remains challenging. As the number of reference identities increases, existing methods often suffer from subject missing and semantic drift. To address this problem, we propose MUSIC, the first MLLM specifically designed for \t… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: ACL 2026

  7. arXiv:2604.07402  [pdf, ps, other

    cs.LG eess.IV

    Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

    Authors: Yucheng Zhou, Jianbing Shen

    Abstract: Autoregressive models have shown superior performance and efficiency in image generation, but remain constrained by high computational costs and prolonged training times in video generation. In this study, we explore methods to accelerate training for autoregressive video generation models through empirical analyses. Our results reveal that while training on fewer video frames significantly reduce… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: ACL 2026 Findings

  8. arXiv:2604.04477  [pdf

    cs.CV

    MVis-Fold: A Three-Dimensional Microvascular Structure Inference Model for Super-Resolution Ultrasound

    Authors: Jincao Yao, Ke Zhang, Yahan Zhou, Jiafei Shen, Jie Liu, Mudassar Ali, Bojian Feng, Jiye Chen, Jinlong Fan, Ping Liang, Dong Xu

    Abstract: Super-resolution ultrasound (SRUS) technology has overcome the resolution limitations of conventional ultrasound, enabling micrometer-scale imaging of microvasculature. However, due to the nature of imaging principles, three-dimensional reconstruction of microvasculature from SRUS remains an open challenge. We developed microvascular visualization fold (MVis-Fold), an innovative three-dimensional… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  9. arXiv:2604.03622  [pdf, ps, other

    cs.SE cs.AI

    Toward Executable Repository-Level Code Generation via Environment Alignment

    Authors: Ruwei Pan, Junlei Shen, Linhao Wu, Yueheng Zhu, Zixiong Yang, Yakun Zhang, Lu Zhang, Hongyu Zhang

    Abstract: Large language models (LLMs) have achieved strong performance on code generation, but existing methods still struggle with repository-level code generation under executable validation. Under this evaluation setting, success is determined not by the plausibility of isolated code fragments, but by whether a generated multi-file repository can be successfully installed, have its dependencies and inte… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  10. arXiv:2604.03478  [pdf, ps, other

    cs.LG

    Investigating Data Interventions for Subgroup Fairness: An ICU Case Study

    Authors: Erin Tan, Judy Hanwen Shen, Irene Y. Chen

    Abstract: In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  11. arXiv:2604.03256  [pdf, ps, other

    cs.CY

    Self-Regulated Personal Contracts as a Harm Reduction Approach to Generative AI in Undergraduate Programming Education

    Authors: Aadarsh Padiyath, Jessica Shen, Barbara Ericson

    Abstract: Students learning programming exercise agency in deciding when and how to use GenAI tools like ChatGPT. However, this agency is often implicit and shaped by deadline pressure and peer behavior rather than explicit and conscious learning goals. We designed a GenAI Contract grounded in harm reduction and self-regulated learning theory to scaffold intentional decision-making: students articulated per… ▽ More

    Submitted 11 March, 2026; originally announced April 2026.

    Comments: Accepted to 2026 ACM Conference on Innovation and Technology in Computer Science Education

  12. arXiv:2604.03007  [pdf, ps, other

    cs.DC cs.DB

    CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization

    Authors: Yuxuan Du, Xuchuan Luo, Xin Wang, Yangfan Zhou, Jiacheng Shen

    Abstract: Memory-disaggregated key-value (KV) stores suffer from a severe performance bottleneck due to their I/O redundancy issues. A huge amount of redundant I/Os are generated when synchronizing concurrent data accesses, making the limited network between the compute and memory pools of DM a performance bottleneck. We identify the root cause for the redundant I/O lies in the mismatch between the optimist… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: This paper is accepted by VLDB'26

  13. arXiv:2604.02753  [pdf, ps, other

    cs.CV

    DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection

    Authors: Siheng Wang, Yanshu Li, Bohan Hu, Zhengdao Li, Haibo Zhan, Linshan Li, Weiming Liu, Ruizhi Qian, Guangxin Wu, Hao Zhang, Jifeng Shen, Piotr Koniusz, Zhengtao Yao, Junhao Dong, Qiang Sun

    Abstract: Open-vocabulary Object Detection (OVOD) enables models to recognize objects beyond predefined categories, but existing approaches remain limited in practical deployment. On the one hand, multimodal designs often incur substantial computational overhead due to their reliance on text encoders at inference time. On the other hand, tightly coupled training objectives introduce a trade-off between clos… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: Accepted at ICLR 2026

  14. arXiv:2604.02355  [pdf, ps, other

    cs.LG cs.CV

    From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

    Authors: Han Song, Yucheng Zhou, Jianbing Shen, Yu Cheng

    Abstract: Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward i… ▽ More

    Submitted 12 March, 2026; originally announced April 2026.

  15. arXiv:2604.02324  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

    Authors: Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak

    Abstract: Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations. We present a systematic analysis of this strategy: through spec… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  16. arXiv:2604.01972  [pdf, ps, other

    cs.CV

    SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions

    Authors: Jie Feng, Jiawei Shen, Junjia Huang, Junpeng Zhang, Mingtao Feng, Weisheng Dong, Guanbin Li

    Abstract: 3D indoor scene generation conditioned on short textual descriptions provides a promising avenue for interactive 3D environment construction without the need for labor-intensive layout specification. Despite recent progress in text-conditioned 3D scene generation, existing works suffer from poor physical plausibility and insufficient detail richness in such semantic condensation cases, largely due… ▽ More

    Submitted 7 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

  17. arXiv:2604.01452  [pdf, ps, other

    cs.AI

    A Multi-Agent Human-LLM Collaborative Framework for Closed-Loop Scientific Literature Summarization

    Authors: Maxwell J. Jacobson, Daniel Xie, Jackson Shen, Adil Wazeer, Haiyan Wang, Xinghang Zhang, Yexiang Xue

    Abstract: Scientific discovery is slowed by fragmented literature that requires excessive human effort to gather, analyze, and understand. AI tools, including autonomous summarization and question answering, have been developed to aid in understanding scientific literature. However, these tools lack the structured, multi-step approach necessary for extracting deep insights from scientific literature. Large… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  18. arXiv:2604.00547  [pdf, ps, other

    cs.AI cs.LG

    Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models

    Authors: Zixiang Peng, Yongxiu Xu, Qinyi Zhang, Jiexun Shen, Yifan Zhang, Hongbo Xu, Yubin Wang, Gaopeng Gou

    Abstract: Unified Multimodal Large Models (UMLMs) integrate understanding and generation capabilities within a single architecture. While this architectural unification, driven by the deep fusion of multimodal features, enhances model performance, it also introduces important yet underexplored safety challenges. Existing safety benchmarks predominantly focus on isolated understanding or generation tasks, fa… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  19. arXiv:2604.00399  [pdf, ps, other

    cs.LG

    A Cross-graph Tuning-free GNN Prompting Framework

    Authors: Yaqi Chen, Shixun Huang, Ryan Twemlow, Lei Wang, John Le, Sheng Wang, Willy Susilo, Jun Yan, Jun Shen

    Abstract: GNN prompting aims to adapt models across tasks and graphs without requiring extensive retraining. However, most existing graph prompt methods still require task-specific parameter updates and face the issue of generalizing across graphs, limiting their performance and undermining the core promise of prompting. In this work, we introduce a Cross-graph Tuning-free Prompting Framework (CTP), which s… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  20. arXiv:2603.28183  [pdf, ps, other

    cs.AI

    PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision

    Authors: Zehua Han, Jing Xiao, Yiqi Duan, Mengyu Xiang, Yuheng Ji, Xiaolong Zheng, Chenghanyu Zhang, Zhendong She, Junyu Shen, Dingwei Tan, Shichu Sun, Zhou Cong, Mingxuan Liu, Fengxiang Wang, Jinping Sun, Yangang Sun

    Abstract: Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "percepti… ▽ More

    Submitted 31 March, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

  21. arXiv:2603.21190  [pdf

    cs.AR

    DS2SC-Agent: A Multi-Agent Automated Pipeline for Rapid Chiplet Model Generation

    Authors: Yiwei Wu, Yifan Wu, Yunhao Xiong, Dengwei Zhao, Jiaxuan Shen, Jianfei Jiang, Guanghui He, Shikui Tu, Yanan Sun

    Abstract: Constructing behavioral-level chiplet models (e.g., SystemC) is crucial for early-stage heterogeneous architecture exploration. Traditional manual modeling is notoriously time-consuming and error-prone. Recently, Large Language Models (LLMs) have demonstrated immense potential in automating hardware code generation. However, existing LLM-assisted design frameworks predominantly target highly struc… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: 9 pages,5figures

  22. arXiv:2603.20907  [pdf, ps, other

    cs.CL

    The Hidden Puppet Master: Predicting Human Belief Change in Manipulative LLM Dialogues

    Authors: Jocelyn Shen, Amina Luvsanchultem, Jessica Kim, Kynnedy Smith, Valdemar Danry, Kantwon Rogers, Hae Won Park, Maarten Sap, Cynthia Breazeal

    Abstract: As users increasingly turn to LLMs for practical and personal advice, they become vulnerable to subtle steering toward hidden incentives misaligned with their own interests. While existing NLP research has benchmarked manipulation detection, these efforts often rely on simulated debates and remain fundamentally decoupled from actual human belief shifts in real-world scenarios. We introduce PUPPET,… ▽ More

    Submitted 27 March, 2026; v1 submitted 21 March, 2026; originally announced March 2026.

  23. arXiv:2603.20698  [pdf, ps, other

    cs.CV cs.CL

    Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs

    Authors: Huan Zheng, Yucheng Zhou, Tianyi Yan, Dubing Chen, Hongbo Lu, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable potential in medical image analysis. However, their application in gastrointestinal endoscopy is currently hindered by two critical limitations: the misalignment between general model reasoning and standardized clinical cognitive pathways, and the lack of causal association between visual features and diagnostic outcomes. In thi… ▽ More

    Submitted 8 April, 2026; v1 submitted 21 March, 2026; originally announced March 2026.

  24. arXiv:2603.20613  [pdf

    cs.HC

    A 4R-supported circular product-service system for luxury branded events

    Authors: Ke Ma, Francesca Valsecchi, Yuchen Tan, Mingjia Ji, Junru Shen, Xiaoya Ma, Duan Wu, Jiao Mo, Shijian Zhao

    Abstract: Temporary luxury branded events run on short cycles and bespoke builds that accelerate material churn. We present a circular phygital product-service system that operationalises the circular economy (CE) through a 4R frame (Refuse, Reduce, Reuse, and Recycling) across warehouse-to-event journeys. Developed via a multi-method design inquiry with a tier-1 contractor, the system couples physical touc… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: 19 pages, 11 figures, accepted to be published in the Proceedings of DRS 2026 (Design Research Society Conference)

  25. arXiv:2603.18115  [pdf, ps, other

    cs.LG cs.AI

    LLM-Augmented Computational Phenotyping of Long Covid

    Authors: Jing Wang, Jie Shen, Amar Sra, Qiaomin Xie, Jeremy C Weiss

    Abstract: Phenotypic characterization is essential for understanding heterogeneity in chronic diseases and for guiding personalized interventions. Long COVID, a complex and persistent condition, yet its clinical subphenotypes remain poorly understood. In this work, we propose an LLM-augmented computational phenotyping framework ``Grace Cycle'' that iteratively integrates hypothesis generation, evidence extr… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  26. arXiv:2603.17722  [pdf, ps, other

    cs.LG cs.CY

    Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement

    Authors: Jing Wang, Jie Shen, Yiming Luo, Amar Sra, Qiaomin Xie, Jeremy C. Weiss

    Abstract: Early prediction of Post-Acute Sequelae of SARS-CoV-2 severity is a critical challenge for women's health, particularly given the diagnostic overlap between PASC and common hormonal transitions such as menopause. Identifying and accounting for these confounding factors is essential for accurate long-term trajectory prediction. We conducted a retrospective study of 1,155 women (mean age 61) from th… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  27. arXiv:2603.17234  [pdf

    cs.CY cs.AI

    Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients

    Authors: Jane Wang, Timothy Keyes, April S Liang, Stephen P Ma, Jason Shen, Jerry Liu, Nerissa Ambers, Abby Pandya, Rita Pandya, Jason Hom, Natasha Steele, Jonathan H Chen, Kevin Schulman

    Abstract: Surgical co-management (SCM) is an evidence-based model in which hospitalists jointly manage medically complex perioperative patients alongside surgical teams. Despite its clinical and financial value, SCM is limited by the need to manually identify eligible patients. To determine whether SCM triage can be automated, we conducted a prospective, unblinded study at Stanford Health Care in which an L… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 35 pages, 4 figures, 5 tables

  28. arXiv:2603.16897  [pdf, ps, other

    eess.SP cs.CL cs.HC cs.LG q-bio.NC

    EEG-Based Brain-LLM Interface for Human Preference Aligned Generation

    Authors: Junzi Zhang, Jianing Shen, Weijie Tu, Yi Zhang, Hailin Zhang, Tom Gedeon, Bin Jiang, Yue Yao

    Abstract: Large language models (LLMs) are becoming an increasingly important component of human--computer interaction, enabling users to coordinate a wide range of intelligent agents through natural language. While language-based interfaces are powerful and flexible, they implicitly assume that users can reliably produce explicit linguistic input, an assumption that may not hold for users with speech or mo… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: 15 pages, 9 figures

  29. arXiv:2603.16104  [pdf, ps, other

    cs.MA cs.AI cs.DB

    Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective

    Authors: Noppanat Wadlom, Junyi Shen, Yao Lu

    Abstract: Agentic workflows are composed of sequences of interdependent Large Language Model (LLM) calls, and they have become a dominant workload in modern AI systems. These workflows exhibit extensive redundancy from overlapping prompts and intermediate results due to speculative and parallel exploration. Existing LLM serving systems, such as vLLM, focus on optimizing individual inference calls and overlo… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  30. arXiv:2603.15483  [pdf, ps, other

    cs.AI

    Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis

    Authors: Penny Chong, Harshavardhan Abichandani, Jiyuan Shen, Atin Ghosh, Min Pyae Moe, Yifan Mai, Daniel Dahlmeier

    Abstract: Agent applications are increasingly adopted to automate workflows across diverse tasks. However, due to the heterogeneous domains they operate in, it is challenging to create a scalable evaluation framework. Prior works each employ their own methods to determine task success, such as database lookups, regex match, etc., adding complexity to the development of a unified agent evaluation approach. M… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Accepted as a conference paper at ICLR 2026. Code and dataset are available in the repository https://github.com/SAP-samples/agent-quality-inspect

  31. arXiv:2603.14948  [pdf, ps, other

    cs.CV

    Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation

    Authors: Xingtai Gui, Meijie Zhang, Tianyi Yan, Wencheng Han, Jiahao Gong, Feiyang Tan, Cheng-zhong Xu, Jianbing Shen

    Abstract: End-to-end autonomous driving aims to generate safe and plausible planning policies from raw sensor input. Driving world models have shown great potential in learning rich representations by predicting the future evolution of a driving scene. However, existing driving world models primarily focus on visual scene representation, and motion representation is not explicitly designed to be planner-sha… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: 16 pages, 9 figures. The code is available at https://github.com/TabGuigui/WorldDrive

  32. arXiv:2603.14238  [pdf, ps, other

    cs.LG cs.MM

    Domain-Skewed Federated Learning with Feature Decoupling and Calibration

    Authors: Huan Wang, Jun Shen, Jun Yan, Guansong Pang

    Abstract: Federated learning (FL) allows distributed clients to collaboratively train a global model in a privacy-preserving manner. However, one major challenge is domain skew, where clients' data originating from diverse domains may hinder the aggregated global model from learning a consistent representation space, resulting in poor generalizable ability in multiple domains. In this paper, we argue that t… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: Accepted at CVPR 2026

  33. arXiv:2603.12465  [pdf, ps, other

    cs.DC cs.LG cs.PF

    TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition

    Authors: Prabhu Vellaisamy, Shreesh Tripathi, Vignesh Natarajan, Surya Santhan Thenarasu, Shawn Blanton, John P. Shen

    Abstract: Large Language Model (LLM) inference is widely used in interactive assistants and agentic systems. In latency-sensitive deployments, inference time can become dominated by host-side overheads. Existing approaches typically expose this cost only as an aggregate residual or a launch/queue metric, which is often insufficient to identify which execution layer should be optimized. This work presents Ta… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

    Comments: Accepted at IEEE ISPASS 2026. Copyright assigned to IEEE

  34. arXiv:2603.11239  [pdf, ps, other

    cs.AI

    Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

    Authors: Haihua Luo, Xuming Ran, Tommi Kärkkäinen, Zhonghua Chen, Jiangrong Shen, Qi Xu, Fengyu Cong

    Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strategies, they still suffer from semantic drift or knowledge forgetting due to continual updating. To address these challenges, we propose SoLA, a Semantic routing-based LoRA framework for lifelong model editing. In SoLA, each edit i… ▽ More

    Submitted 19 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

  35. arXiv:2603.11211  [pdf, ps, other

    cs.CV cs.AI

    A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters

    Authors: Haihua Luo, Xuming Ran, Jiangrong Shen, Timo Hämäläinen, Zhonghua Chen, Qi Xu, Fengyu Cong

    Abstract: Incremental Learning (IL) aims to learn new tasks while preserving previously acquired knowledge. Integrating the zero-shot learning capabilities of pre-trained vision-language models into IL methods has marked a significant advancement. However, these methods face three primary challenges: (1) the need for improved training efficiency; (2) reliance on a memory bank to store previous data; and (3)… ▽ More

    Submitted 19 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

  36. arXiv:2603.10814  [pdf, ps, other

    cs.CV

    HanMoVLM: Large Vision-Language Models for Professional Artistic Painting Evaluation

    Authors: Hongji Yang, Yucheng Zhou, Wencheng Han, Songlian Li, Xiaotong Zhao, Jianbing Shen

    Abstract: While Large Vision-Language Models (VLMs) demonstrate impressive general visual capabilities, they remain artistically blind and unable to offer professional evaluation of artworks within specific artistic domains like human experts. To bridge this gap, we transform VLMs into experts capable of professional-grade painting evaluation in the Chinese Artistic Domain, which is more abstract and demand… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 14 pages

  37. arXiv:2603.08174  [pdf, ps, other

    cs.CV

    MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

    Authors: Junyu Shen, Zhendong She, Chenghanyu Zhang, Yuchuang Sun, Luqing Luo, Dingwei Tan, Zonghao Guo, Bo Guo, Zehua Han, Wupeng Xie, Yaxin Mu, Peng Zhang, Peipei Li, Fengxiang Wang, Yangang Sun, Maosong Sun

    Abstract: The paradigm of Multimodal Large Language Models (MLLMs) offers a promising blueprint for advancing the electromagnetic (EM) domain. However, prevailing approaches often deviate from the native MLLM paradigm, instead using task-specific or pipelined architectures that lead to fundamental limitations in model performance and generalization. Fully realizing the MLLM potential in EM domain requires o… ▽ More

    Submitted 24 March, 2026; v1 submitted 9 March, 2026; originally announced March 2026.

  38. arXiv:2603.07769  [pdf, ps, other

    cs.CV

    MedQ-Deg: A Multidimensional Benchmark for Evaluating MLLMs Across Medical Image Quality Degradations

    Authors: Jiyao Liu, Junzhi Ning, Chenglong Ma, Wanying Qu, Jianghan Shen, Siqi Luo, Jinjie Wei, Jin Ye, Pengze Li, Tianbin Li, Jiashi Lin, Hongming Shan, Xinzhe Luo, Xiaohong Liu, Lihao Liu, Junjun He, Ningsheng Xu

    Abstract: Despite impressive performance on standard benchmarks, multimodal large language models (MLLMs) face critical challenges in real-world clinical environments where medical images inevitably suffer various quality degradations. Existing benchmarks exhibit two key limitations: (1) absence of large-scale, multidimensional assessment across medical image quality gradients and (2) no systematic confiden… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

    Comments: 29 pages, 11 figures

  39. arXiv:2603.07523  [pdf, ps, other

    cs.LG

    One-for-All Model Initialization with Frequency-Domain Knowledge

    Authors: Jianlu Shen, Fu Feng, Yucheng Xie, Jiaqi Lv, Xin Geng

    Abstract: Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture th… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  40. arXiv:2603.07506  [pdf, ps, other

    cs.LG

    A Unified Framework for Knowledge Transfer in Bidirectional Model Scaling

    Authors: Jianlu Shen, Fu Feng, Jiaze Xu, Yucheng Xie, Jiaqi Lv, Xin Geng

    Abstract: Transferring pre-trained knowledge from a source model to a target model of a different architectural size is a key challenge for flexible and efficient model scaling. However, current parameter-space methods treat Small-to-Large (S2L) and Large-to-Small (L2S) scaling as separate, incompatible problems, focusing on parameter synthesis and selection, respectively. This fragmented perspective has re… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  41. arXiv:2603.05898  [pdf, ps, other

    cs.CV

    InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation

    Authors: Yuxin Qin, Ke Cao, Haowei Liu, Ao Ma, Fengheng Li, Honghe Zhu, Zheng Zhang, Run Ling, Wei Feng, Xuanhua He, Zhanjie Zhang, Zhen Guo, Haoyi Bian, Jingjing Lv, Junjie Shen, Ching Law

    Abstract: E-commerce product poster generation aims to automatically synthesize a single image that effectively conveys product information by presenting a subject, text, and a designed style. Recent diffusion models with fine-grained and efficient controllability have advanced product poster synthesis, yet they typically rely on multi-stage pipelines, and simultaneous control over subject, text, and style… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR2026

  42. arXiv:2603.05863  [pdf, ps, other

    cs.CL cs.LG cs.SE

    ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

    Authors: Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim

    Abstract: While Large Language Models (LLMs) have revolutionized code generation, standard "System 1" approaches, generating solutions in a single forward pass, often hit a performance ceiling when faced with complex algorithmic tasks. Existing iterative refinement strategies attempt to bridge this gap at inference time, yet they predominantly rely on external oracles, execution feedback, or computationally… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  43. arXiv:2603.04859  [pdf, ps, other

    cs.CR cs.LG

    Osmosis Distillation: Model Hijacking with the Fewest Samples

    Authors: Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou

    Abstract: Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  44. arXiv:2603.02789  [pdf, ps, other

    cs.CL cs.AI

    OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

    Authors: Jiyuan Shen, Peiyue Yuan, Atin Ghosh, Yifan Mai, Daniel Dahlmeier

    Abstract: Multimodal Large Language Models (MLLMs) enhance the potential of natural language processing. However, their actual impact on document information extraction remains unclear. In particular, it is unclear whether an MLLM-only pipeline--while simpler--can truly match the performance of traditional OCR+MLLM setups. In this paper, we conduct a large-scale benchmarking study that evaluates various out… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  45. arXiv:2603.02547  [pdf, ps, other

    cs.CL cs.AI cs.LG

    CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

    Authors: Junzhe Shen, Jieru Zhao, Ziwei He, Zhouhan Lin

    Abstract: We study why continuous diffusion language models (DLMs) have lagged behind discrete diffusion approaches despite their appealing continuous generative dynamics. Under a controlled token--recovery study, we identify token rounding, the final projection from denoised embeddings to tokens, as a primary bottleneck. Building on these insights, we propose CoDAR (Continuous Diffusion with Contextual Aut… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

  46. arXiv:2603.01513  [pdf, ps, other

    cs.SI math.CO

    A two-steps tensor eigenvector centrality for nodes and hyperedges in hypergraphs

    Authors: Qing Xu, Chunmeng Liu, Changjiang Bu, Jihong Shen

    Abstract: Hypergraphs have been a powerful tool to represent higher-order interactions, where hyperedges can connect an arbitrary number of nodes. Quantifying the relative importance of nodes and hyperedges in hypergraphs is a fundamental problem in network analysis. In this paper, we propose a new tensor-based centrality measure for general hypergraphs. We use a third-order tensor to represent the relation… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

  47. arXiv:2603.01053  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Turning Black Box into White Box: Dataset Distillation Leaks

    Authors: Huajie Chen, Tianqing Zhu, Yuchen Zhong, Yang Zhang, Shang Wang, Feng He, Lefeng Zhang, Jialiang Shen, Minghao Wang, Wanlei Zhou

    Abstract: Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajec… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

  48. Texterial: A Text-as-Material Interaction Paradigm for LLM-Mediated Writing

    Authors: Jocelyn Shen, Nicolai Marquardt, Hugo Romat, Ken Hinckley, Nathalie Riche, Fanny Chevalier

    Abstract: What if text could be sculpted and refined like clay -- or cultivated and pruned like a plant? Texterial reimagines text as a material that users can grow, sculpt, and transform. Current generative-AI models enable rich text operations, yet rigid, linear interfaces often mask such capabilities. We explore how the text-as-material metaphor can reveal AI-enabled operations, reshape the writing proce… ▽ More

    Submitted 27 February, 2026; originally announced March 2026.

    ACM Class: H.5.2; I.2.m; D.2.2; D.2.10

    Journal ref: Proceedings of the 2026 ACM CHI Conference on Human Factors in Computing Systems (ACM CHI 2026)

  49. arXiv:2603.00376  [pdf, ps, other

    cs.AI

    NeuroHex: Highly-Efficient Hex Coordinate System for Creating World Models to Enable Adaptive AI

    Authors: Quinn Jacobson, Joe Luo, Jingfei Xu, Shanmuga Venkatachalam, Kevin Wang, Dingchao Rong, John Paul Shen

    Abstract: NeuroHex is a hexagonal coordinate system designed to support highly efficient world models and reference frames for online adaptive AI systems. Inspired by the hexadirectional firing structure of grid cells in the human brain, NeuroHex adopts a cubic isometric hexagonal coordinate formulation that provides full 60° rotational symmetry and low-cost translation, rotation and distance computation. W… ▽ More

    Submitted 3 March, 2026; v1 submitted 27 February, 2026; originally announced March 2026.

    Comments: 8 + 1 pages, 9 figures, published at NICE 2026

    Journal ref: NICE 2026

  50. arXiv:2602.21944  [pdf, ps, other

    cs.CV

    Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

    Authors: Haoran Li, Yuxin Lin, Huan Wang, Xiaoling Luo, Qi Zhu, Jiahua Shi, Huaming Chen, Bo Du, Johan Barthelemy, Zongyan Xue, Jun Shen, Yong Xu

    Abstract: Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide, making early and accurate DR grading critical for timely intervention. Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view (FOV), motivating deep learning methods to explore the potential of multi-view learning for DR grading. However, existing methods… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.