-
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
Authors:
Ashutosh Kumar,
Rajat Saini,
Jingjing Pan,
Mustafa Erdogan,
Mingfang Zhang,
Betty Le Dem,
Norimasa Kobori,
Quan Kong
Abstract:
Current vision-language pre-training (VLP) paradigms excel at global scene understanding but struggle with instance-level reasoning due to global-only supervision. We introduce InstAP, an Instance-Aware Pre-training framework that jointly optimizes global vision-text alignment and fine-grained, instance-level contrastive alignment by grounding textual mentions to specific spatial-temporal regions.…
▽ More
Current vision-language pre-training (VLP) paradigms excel at global scene understanding but struggle with instance-level reasoning due to global-only supervision. We introduce InstAP, an Instance-Aware Pre-training framework that jointly optimizes global vision-text alignment and fine-grained, instance-level contrastive alignment by grounding textual mentions to specific spatial-temporal regions. To support this, we present InstVL, a large-scale dataset (2 million images, 50,000 videos) with dual-granularity annotations: holistic scene captions and dense, grounded instance descriptions. On the InstVL benchmark, InstAP substantially outperforms existing VLP models on instance-level retrieval, and also surpasses a strong VLP baseline trained on the exact same data corpus, isolating the benefit of our instance-aware objective. Moreover, instance-centric pre-training improves global understanding: InstAP achieves competitive zero-shot performance on multiple video benchmarks, including MSR-VTT and DiDeMo. Qualitative visualizations further show that InstAP localizes textual mentions to the correct instances, while global-only models exhibit more diffuse, scene-level attention.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
Authors:
Yuming Xu,
Mingtao Zhang,
Zhuohan Ge,
Haoyang Li,
Nicole Hu,
Jason Chen Zhang,
Qing Li,
Lei Chen
Abstract:
Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowle…
▽ More
Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowledge-access pipeline. We establish an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. Guided by this perspective, we abstract the RAG workflow into six stages and organize the literature around three trust boundaries and four primary security surfaces, including pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration. By systematically reviewing the corresponding attacks, defenses, remediation mechanisms, and evaluation benchmarks, we reveal that current defenses remain largely reactive and fragmented. Finally, we discuss these gaps and highlight future directions toward layered, boundary-aware protection across the entire knowledge-access lifecycle.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
Authors:
Ruotao Xu,
Yixin Ji,
Yu Luo,
Jinpeng Li,
Dong Li,
Peifeng Li,
Juntao Li,
Min Zhang
Abstract:
Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution…
▽ More
Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution within the reasoning trajectory. Although recent works have released some powerful open-source TIR models, our analysis reveals that these models still suffer from critical deficiencies. We find that when the reasoning of the model conflicts with the tool results, the model tends to believe in its own reasoning. And there are cases where the tool results are correct but are ignored by the model, resulting in incorrect answers, which we define as "Tool Ignored''. This indicates that the model does not know when to trust or ignore the tool. To overcome these limitations, We introduce Adaptive Tool Trust Calibration (ATTC), a novel framework that guides the model to adaptively choose to trust or ignore the tool results based on the confidence score of generated code blocks. The experimental results from various open-source TIR models of different sizes and across multiple datasets demonstrate that ATTC effectively reduces the "Tool Ignored" issue, resulting in a performance increase of 4.1% to 7.5%.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks
Authors:
Jiayang Xu,
Fan Zhuo,
Majun Zhang,
Changhao Pan,
Zehan Wang,
Siyu Chen,
Xiaoda Yang,
Tao Jin,
Zhou Zhao
Abstract:
Current video editing models often rely on expensive paired video data, which limits their practical scalability. In essence, most video editing tasks can be formulated as a decoupled spatiotemporal process, where the temporal dynamics of the pretrained model are preserved while spatial content is selectively and precisely modified. Based on this insight, we propose ImVideoEdit, an efficient frame…
▽ More
Current video editing models often rely on expensive paired video data, which limits their practical scalability. In essence, most video editing tasks can be formulated as a decoupled spatiotemporal process, where the temporal dynamics of the pretrained model are preserved while spatial content is selectively and precisely modified. Based on this insight, we propose ImVideoEdit, an efficient framework that learns video editing capabilities entirely from image pairs. By freezing the pre-trained 3D attention modules and treating images as single-frame videos, we decouple the 2D spatial learning process to help preserve the original temporal dynamics. The core of our approach is a Predict-Update Spatial Difference Attention module that progressively extracts and injects spatial differences. Rather than relying on rigid external masks, we incorporate a Text-Guided Dynamic Semantic Gating mechanism for adaptive and implicit text-driven modifications. Despite training on only 13K image pairs for 5 epochs with exceptionally low computational overhead, ImVideoEdit achieves editing fidelity and temporal consistency comparable to larger models trained on extensive video datasets.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
Authors:
Weiyang Huang,
Xuefeng Bai,
Kehai Chen,
Xinyang Chen,
Yibin Chen,
Weili Guan,
Min Zhang
Abstract:
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framewo…
▽ More
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models
Authors:
Qihui Zhu,
Tao Zhang,
Yuchen Wang,
Zijian Wen,
Mengjie Zhang,
Shuangwu Chen,
Xiaobin Tan,
Jian Yang,
Yang Liu,
Zhenhua Dong,
Xianzhi Yu,
Yinfei Pan
Abstract:
In multimodal large language models (MLLMs), the surge of visual tokens significantly increases the inference time and computational overhead, making them impractical for real-time or resource-constrained applications. Visual token pruning is a promising strategy for reducing the cost of MLLM inference by removing redundant visual tokens. Existing research usually assumes that all attention heads…
▽ More
In multimodal large language models (MLLMs), the surge of visual tokens significantly increases the inference time and computational overhead, making them impractical for real-time or resource-constrained applications. Visual token pruning is a promising strategy for reducing the cost of MLLM inference by removing redundant visual tokens. Existing research usually assumes that all attention heads contribute equally to the visual interpretation. However, our study reveals that different heads may capture distinct visual semantics and inherently play distinct roles in visual processing. In light of this observation, we propose HAWK, a head importance-aware visual token pruning method that perceives the varying importance of attention heads in visual tasks to maximize the retention of crucial tokens. By leveraging head importance weights and text-guided attention to assess visual token significance, HAWK effectively retains task-relevant visual tokens while removing redundant ones. The proposed HAWK is entirely training-free and can be seamlessly applied to various MLLMs. Extensive experiments on multiple mainstream vision-language benchmarks demonstrate that HAWK achieves state-of-the-art accuracy. When applied to Qwen2.5-VL, HAWK retains 96.0% of the original accuracy after pruning 80.2% of the visual tokens. Additionally, it reduces end-to-end latency to 74.4% of the original and further decreases GPU memory usage across the tested models. The code is available at https://github.com/peppery77/HAWK.git.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Influence of Plaque Characteristics on Stent Biomechanical Outcomes - A Case Study on Double Kissing Crush Coronary Stenting
Authors:
Andrea Colombo,
Dario Carbonarob,
Mingzi Zhang,
Chi Shen,
Ankush Kapoor,
Nigel Jepson,
Claudio Chiastra,
Susann Beier
Abstract:
Background Double Kissing (DK) Crush is a two-stent technique for complex coronary bifurcation lesions, yet the biomechanical influence of plaque on its performance remains poorly understood. This study developed a computational biomechanical model of the DK-Crush procedure to quantify how plaque presence and composition affect procedural outcomes and the performance of Xience Sierra and Orsiro st…
▽ More
Background Double Kissing (DK) Crush is a two-stent technique for complex coronary bifurcation lesions, yet the biomechanical influence of plaque on its performance remains poorly understood. This study developed a computational biomechanical model of the DK-Crush procedure to quantify how plaque presence and composition affect procedural outcomes and the performance of Xience Sierra and Orsiro stents. Methods A population-representative coronary bifurcation was modelled with no plaque, lipid plaque, and fibrous plaque. The complete DK-Crush sequence was simulated using finite element analysis for both stent platforms. Mechanical outcomes included arterial wall stress, malapposition, side branch ostium clearance, and residual stenosis. Post-deployment hemodynamics was assessed using pulsatile computational fluid dynamics, quantifying high shear rate volume and lumen area exposed to low time-averaged endothelial shear stress (TAESS). Results Plaque presence and stiffness reduced lumen restoration, increased arterial wall stress, led to larger high shear rate regions and, for fibrous plaque, increased exposure to low TAESS. Malapposition and ostial clearance depended mainly on stent design. Plaque also altered the relative performance of the two platforms, revealing differences not observed in plaque-free models. Conclusions Plaque characteristics substantially affect DK-Crush biomechanics and modify stent behaviour. Incorporating plaque is therefore essential for realistic computational evaluation of bifurcation stenting.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
Authors:
Liang Yao,
Shengxiang Xu,
Fan Liu,
Chuanyi Zhang,
Bishun Yao,
Rui Min,
Yongjun Li,
Chaoqian Ouyang,
Shimin Di,
Min-Ling Zhang
Abstract:
Earth Observation (EO) systems are essentially designed to support domain experts who often express their requirements through vague natural language rather than precise, machine-friendly instructions. Depending on the specific application scenario, these vague queries can demand vastly different levels of visual precision. Consequently, a practical EO AI system must bridge the gap between ambiguo…
▽ More
Earth Observation (EO) systems are essentially designed to support domain experts who often express their requirements through vague natural language rather than precise, machine-friendly instructions. Depending on the specific application scenario, these vague queries can demand vastly different levels of visual precision. Consequently, a practical EO AI system must bridge the gap between ambiguous human queries and the appropriate multi-granularity visual analysis tasks, ranging from holistic image interpretation to fine-grained pixel-wise predictions. While Multi-modal Large Language Models (MLLMs) demonstrate strong semantic understanding, their text-based output format is inherently ill-suited for dense, precision-critical spatial predictions. Existing agentic frameworks address this limitation by delegating tasks to external tools, but indiscriminate tool invocation is computationally inefficient and underutilizes the MLLM's native capabilities. To this end, we propose RemoteAgent, an agentic framework that strategically respects the intrinsic capability boundaries of MLLMs. To empower this framework to understand real user intents, we construct VagueEO, a human-centric instruction dataset pairing EO tasks with simulated vague natural-language queries. By leveraging VagueEO for reinforcement fine-tuning, we align an MLLM into a robust cognitive core that directly resolves image- and sparse region-level tasks. Consequently, RemoteAgent processes suitable tasks internally while intelligently orchestrating specialized tools via the Model Context Protocol exclusively for dense predictions. Extensive experiments demonstrate that RemoteAgent achieves robust intent recognition capabilities while delivering highly competitive performance across diverse EO tasks.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models
Authors:
Ziyi Chen,
Yasir Khan,
Mengyuan Zhang,
Cheng Peng,
Mengxian Lyu,
Yiyang Liu,
Krishna Vaddiparti,
Robert L Cook,
Mattia Prosperi,
Yonghui Wu
Abstract:
Human immunodeficiency virus (HIV)-related stigma is a critical psychosocial determinant of health for people living with HIV (PLWH), influencing mental health, engagement in care, and treatment outcomes. Although stigma-related experiences are documented in clinical narratives, there is a lack of off-the-shelf tools to extract and categorize them. This study aims to develop a large language model…
▽ More
Human immunodeficiency virus (HIV)-related stigma is a critical psychosocial determinant of health for people living with HIV (PLWH), influencing mental health, engagement in care, and treatment outcomes. Although stigma-related experiences are documented in clinical narratives, there is a lack of off-the-shelf tools to extract and categorize them. This study aims to develop a large language model (LLM)-based tool for identifying HIV stigma from clinical notes. We identified clinical notes from PLWH receiving care at the University of Florida (UF) Health between 2012 and 2022. Candidate sentences were identified using expert-curated stigma-related keywords and iteratively expanded via clinical word embeddings. A total of 1,332 sentences were manually annotated across four stigma subscales: Concern with Public Attitudes, Disclosure Concerns, Negative Self-Image, and Personalized Stigma. We compared GatorTron-large and BERT as encoder-based baselines, and GPT-OSS-20B, LLaMA-8B, and MedGemma-27B as generative LLMs, under zero-shot and few-shot prompting. GatorTron-large achieved the best overall performance (Micro F1 = 0.62). Few-shot prompting substantially improved generative model performance, with 5-shot GPT-OSS-20B and LLaMA-8B achieving Micro-F1 scores of 0.57 and 0.59, respectively. Performance varied by stigma subscale, with Negative Self-Image showing the highest predictability and Personalized Stigma remaining the most challenging. Zero-shot generative inference exhibited non-trivial failure rates (up to 32%). This study develops the first practical NLP tool for identifying HIV stigma in clinical notes.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
Authors:
Quantong Qiu,
Zhiyi Hong,
Yi Yang,
Haitian Wang,
Kebin Liu,
Qingqing Dang,
Juntao Li,
Min Zhang
Abstract:
The quadratic computational complexity of standard attention mechanisms presents a severe scalability bottleneck for LLMs in long-context scenarios. While hybrid attention mechanisms combining Full Attention (FA) and Sparse Attention (SA) offer a potential solution, existing methods typically rely on static allocation ratios that fail to accommodate the variable retrieval demands of different task…
▽ More
The quadratic computational complexity of standard attention mechanisms presents a severe scalability bottleneck for LLMs in long-context scenarios. While hybrid attention mechanisms combining Full Attention (FA) and Sparse Attention (SA) offer a potential solution, existing methods typically rely on static allocation ratios that fail to accommodate the variable retrieval demands of different tasks. Furthermore, head-level dynamic sparsity often introduces severe computational load imbalance and synchronization long-tails, which hinder hardware acceleration during autoregressive decoding. To bridge this gap, we introduce Flux Attention, a context-aware framework that dynamically optimizes attention computation at the layer level. By integrating a lightweight Layer Router into frozen pretrained LLMs, the proposed method adaptively routes each layer to FA or SA based on the input context. This layer-wise routing preserves high-fidelity information retrieval while ensuring contiguous memory access, translating theoretical computational reductions into practical wall-clock speedups. As a parameter-efficient approach, our framework requires only 12 hours of training on 8$\times$A800 GPUs. Extensive experiments across multiple long-context and mathematical reasoning benchmarks demonstrate that Flux Attention achieves a superior trade-off between performance and inference speed compared with baseline models, with speed improvements of up to $2.8\times$ and $2.0\times$ in the prefill and decode stages.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
When Is Thinking Enough? Early Exit via Sufficiency Assessment for Efficient Reasoning
Authors:
Yang Xiang,
Yixin Ji,
Ruotao Xu,
Dan Qiao,
Zheming Yang,
Juntao Li,
Min Zhang
Abstract:
Large reasoning models (LRMs) have achieved remarkable performance in complex reasoning tasks, driven by their powerful inference-time scaling capability. However, LRMs often suffer from overthinking, which results in substantial computational redundancy and significantly reduces efficiency. Early-exit methods aim to mitigate this issue by terminating reasoning once sufficient evidence has been ge…
▽ More
Large reasoning models (LRMs) have achieved remarkable performance in complex reasoning tasks, driven by their powerful inference-time scaling capability. However, LRMs often suffer from overthinking, which results in substantial computational redundancy and significantly reduces efficiency. Early-exit methods aim to mitigate this issue by terminating reasoning once sufficient evidence has been generated, yet existing approaches mostly rely on handcrafted or empirical indicators that are unreliable and impractical. In this work, we introduce Dynamic Thought Sufficiency in Reasoning (DTSR), a novel framework for efficient reasoning that enables the model to dynamically assess the sufficiency of its chain-of-thought (CoT) and determine the optimal point for early exit. Inspired by human metacognition, DTSR operates in two stages: (1) Reflection Signal Monitoring, which identifies reflection signals as potential cues for early exit, and (2) Thought Sufficiency Check, which evaluates whether the current CoT is sufficient to derive the final answer. Experimental results on the Qwen3 models show that DTSR reduces reasoning length by 28.9%-34.9% with minimal performance loss, effectively mitigating overthinking. We further discuss overconfidence in LRMs and self-evaluation paradigms, providing valuable insights for early-exit reasoning.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design
Authors:
Juan Du,
Yueteng Wu,
Pan Zhao,
Yuze Liu,
Min Zhang,
Xiaobin Xu,
Xinglong Zhang
Abstract:
The aerodynamic design of turbomachinery is a complex and tightly coupled multi-stage process involving geometry generation, performance prediction, optimization, and high-fidelity physical validation. Existing intelligent design approaches typically focus on individual stages or rely on loosely coupled pipelines, making fully autonomous end-to-end design challenging. To address this issue, this s…
▽ More
The aerodynamic design of turbomachinery is a complex and tightly coupled multi-stage process involving geometry generation, performance prediction, optimization, and high-fidelity physical validation. Existing intelligent design approaches typically focus on individual stages or rely on loosely coupled pipelines, making fully autonomous end-to-end design challenging. To address this issue, this study proposes TurboAgent, a large language model (LLM)-driven autonomous multi-agent framework for turbomachinery aerodynamic design and optimization. The LLM serves as the core for task planning and coordination, while specialized agents handle generative design, rapid performance prediction, multi-objective optimization, and physics-based validation. The framework transforms traditional trial-and-error design into a data-driven collaborative workflow, with high-fidelity simulations retained for final verification. A transonic single-rotor compressor is used for validation. The results show strong agreement between target performance, generated designs, and CFD simulations. The coefficients of determination for mass flow rate, total pressure ratio, and isentropic efficiency all exceed 0.91, with normalized RMSE values below 8%. The optimization agent further improves isentropic efficiency by 1.61% and total pressure ratio by 3.02%. The complete workflow can be executed within approximately 30 minutes under parallel computing. These results demonstrate that TurboAgent enables an autonomous closed-loop design process from natural language requirements to final design generation, providing an efficient and scalable paradigm for turbomachinery aerodynamic design.
△ Less
Submitted 8 April, 2026; v1 submitted 8 April, 2026;
originally announced April 2026.
-
Precise measurement of the CKM angle $γ$ with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from…
▽ More
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider…
▽ More
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Non-GRS type MDS and AMDS codes from extended TGRS codes
Authors:
Meiying Zhang,
Shudi Yang,
Yanbin Zheng
Abstract:
Maximum distance separable (MDS) and almost maximum distance separable (AMDS) codes have been widely used in various fields such as communication systems, data storage, and quantum codes because of their algebraic properties and excellent error-correcting capabilities. In this paper, we construct a class of extended twisted generalized Reed-Solomon (TGRS) codes and determine the necessary and suff…
▽ More
Maximum distance separable (MDS) and almost maximum distance separable (AMDS) codes have been widely used in various fields such as communication systems, data storage, and quantum codes because of their algebraic properties and excellent error-correcting capabilities. In this paper, we construct a class of extended twisted generalized Reed-Solomon (TGRS) codes and determine the necessary and sufficient conditions for these codes to be MDS or AMDS. Additionally, we prove that these codes are not equivalent to generalized Reed-Solomon (GRS) codes. As an application, under certain circumstances, we compute the covering radii and deep holes of these codes.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Synergizing Efficiency and Reliability for Continuous Mobile Manipulation
Authors:
Chengkai Wu,
Ruilin Wang,
Yixin Zeng,
Jiayuan Wang,
Mingjie Zhang,
Guiyong Zheng,
Qun Niu,
Juepeng Zheng,
Jun Ma,
Boyu Zhou
Abstract:
Humans seamlessly fuse anticipatory planning with immediate feedback to perform successive mobile manipulation tasks without stopping, achieving both high efficiency and reliability. Replicating this fluid and reliable behavior in robots remains fundamentally challenging, not only due to conflicts between long-horizon planning and real-time reactivity, but also because excessively pursuing efficie…
▽ More
Humans seamlessly fuse anticipatory planning with immediate feedback to perform successive mobile manipulation tasks without stopping, achieving both high efficiency and reliability. Replicating this fluid and reliable behavior in robots remains fundamentally challenging, not only due to conflicts between long-horizon planning and real-time reactivity, but also because excessively pursuing efficiency undermines reliability in uncertain environments: it impairs stable perception and the potential for compensation, while also increasing the risk of unintended contact. In this work, we present a unified framework that synergizes efficiency and reliability for continuous mobile manipulation. It features a reliability-aware trajectory planner that embeds essential elements for reliable execution into spatiotemporal optimization, generating efficient and reliability-promising global trajectories. It is coupled with a phase-dependent switching controller that seamlessly transitions between global trajectory tracking for efficiency and task-error compensation for reliability. We also investigate a hierarchical initialization that facilitates online replanning despite the complexity of long-horizon planning problems. Real-world evaluations demonstrate that our approach enables efficient and reliable completion of successive tasks under uncertainty (e.g., dynamic disturbances, perception and control errors). Moreover, the framework generalizes to tasks with diverse end-effector constraints. Compared with state-of-the-art baselines, our method consistently achieves the highest efficiency while improving the task success rate by 26.67\%--81.67\%. Comprehensive ablation studies further validate the contribution of each component. The source code will be released.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content
Authors:
Shuzhen Bi,
Mingzi Zhang,
Zhuoxuan Li,
Xiaolong Wang,
keqian Li,
Aimin Zhou
Abstract:
Large language models are increasingly used as educational assistants, yet evaluation of their educational capabilities remains concentrated on question-answering and tutoring tasks. A critical gap exists for multimedia instructional content generation -- the ability to produce coherent, diagram-rich explanations that combine geometrically accurate visuals with step-by-step reasoning. We present E…
▽ More
Large language models are increasingly used as educational assistants, yet evaluation of their educational capabilities remains concentrated on question-answering and tutoring tasks. A critical gap exists for multimedia instructional content generation -- the ability to produce coherent, diagram-rich explanations that combine geometrically accurate visuals with step-by-step reasoning. We present EduIllustrate, a benchmark for evaluating LLMs on interleaved text-diagram explanation generation for K-12 STEM problems. The benchmark comprises 230 problems spanning five subjects and three grade levels, a standardized generation protocol with sequential anchoring to enforce cross-diagram visual consistency, and an 8-dimension evaluation rubric grounded in multimedia learning theory covering both text and visual quality. Evaluation of ten LLMs reveals a wide performance spread: Gemini 3.0 Pro Preview leads at 87.8\%, while Kimi-K2.5 achieves the best cost-efficiency (80.8\% at \\$0.12/problem). Workflow ablation confirms sequential anchoring improves Visual Consistency by 13\% at 94\% lower cost. Human evaluation with 20 expert raters validates LLM-as-judge reliability for objective dimensions ($ρ\geq 0.83$) while revealing limitations on subjective visual assessment.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model
Authors:
Zesheng Yao,
Zhen-Hua Wan,
Canjun Yang,
Qingchao Xia,
Mengqi Zhang
Abstract:
Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures, the proposed approach leverages a ROM to estimate the gradient information required for controller opt…
▽ More
Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures, the proposed approach leverages a ROM to estimate the gradient information required for controller optimization. The design of the ROM structure incorporates physical insights. The ROM integrates a linear dynamical system and a neural ordinary differential equation (NODE) for estimating the nonlinearity in the flow. The parameters of the linear component are identified via operator inference, while the NODE is trained in a data-driven manner using gradient-based optimization. During controller--environment interactions, the ROM is continuously updated with newly collected data, enabling adaptive refinement of the model. The controller is then optimized through differentiable simulation of the ROM. The proposed ROM-based DRL framework is validated on two canonical flow control problems: Blasius boundary layer flow and flow past a square cylinder. For the Blasius boundary layer, the proposed method effectively reduces to a single-episode system identification and controller optimization process, yet it yields controllers that outperform traditional linear designs and achieve performance comparable to DRL approaches with minimal data. For the flow past a square cylinder, the proposed method achieves superior drag reduction with significantly fewer exploration data compared with DRL approaches. The work addresses a key component of model-free DRL control algorithms and lays the foundation for designing more sample-efficient DRL-based active flow controllers.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference
Authors:
Guoci Chen,
Xiurui Pan,
Qiao Li,
Bo Mao,
Congming Gao,
Chengying Huan,
Mingzhe Zhang,
Jie Zhang
Abstract:
Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but current FHE methods struggle with efficient and precise nonlinear function evaluation. Specifically, CKKS-based approaches require high-degree polynomial approximations, which are costly when target preci…
▽ More
Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but current FHE methods struggle with efficient and precise nonlinear function evaluation. Specifically, CKKS-based approaches require high-degree polynomial approximations, which are costly when target precision increases. Alternatively, TFHE's Programmable Bootstrapping (PBS) outperforms CKKS by offering exact lookup-table evaluation. But it lacks high-precision implementations of LLM nonlinear layers and underutilizes GPU resources.
We propose \emph{TIGER}, the first GPU-accelerated framework for high-precision TFHE-based nonlinear LLM layer evaluation. TIGER offers: (1) GPU-optimized WoP-PBS method combined with numerical algorithms to surpass native lookup-table precision limits on nonlinear functions; (2) high-precision and efficient implementations of key nonlinear layers, enabling practical encrypted inference; (3) batch-driven design exploiting inter-input parallelism to boost GPU efficiency. TIGER achieves 7.17$\times$, 16.68$\times$, and 17.05$\times$ speedups over a CPU baseline for GELU, Softmax, and LayerNorm, respectively.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
Authors:
Hang Xu,
Ling Yue,
Chaoqian Ouyang,
Yuchen Liu,
Libin Zheng,
Shaowu Pan,
Shimin Di,
Min-Ling Zhang
Abstract:
Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactRev…
▽ More
Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactReview, an evidence-grounded reviewing system that combines claim extraction, literature positioning, and execution-based claim verification. Given a submission, FactReview identifies major claims and reported results, retrieves nearby work to clarify the paper's technical position, and, when code is available, executes the released repository under bounded budgets to test central empirical claims. It then produces a concise review and an evidence report that assigns each major claim one of five labels: Supported, Supported by the paper, Partially supported, In conflict, or Inconclusive. In a case study on CompGCN, FactReview reproduces results that closely match those reported for link prediction and node classification, yet also shows that the paper's broader performance claim across tasks is not fully sustained: on MUTAG graph classification, the reproduced result is 88.4%, whereas the strongest baseline reported in the paper remains 92.6%. The claim is therefore only partially supported. More broadly, this case suggests that AI is most useful in peer review not as a final decision-maker, but as a tool for gathering evidence and helping reviewers produce more evidence-grounded assessments. The code is public at https://github.com/DEFENSE-SEU/Review-Assistant.
△ Less
Submitted 7 April, 2026; v1 submitted 5 April, 2026;
originally announced April 2026.
-
Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding
Authors:
Shuyin Ouyang,
Jie M. Zhang,
Jingzhi Gong,
Gunel Jahangirova,
Mohammad Reza Mousavi,
Jack Johns,
Beum Seuk Lee,
Adam Ziolkowski,
Botond Virginas,
Joost Noppen
Abstract:
Software architecture diagrams are important design artifacts for communicating system structure, behavior, and data organization throughout the software development lifecycle. Although recent progress in large language models has substantially advanced code-centric software engineering tasks such as code generation, testing, and maintenance, the ability of modern vision-language models (VLMs) to…
▽ More
Software architecture diagrams are important design artifacts for communicating system structure, behavior, and data organization throughout the software development lifecycle. Although recent progress in large language models has substantially advanced code-centric software engineering tasks such as code generation, testing, and maintenance, the ability of modern vision-language models (VLMs) to understand software architecture diagrams remains underexplored. To address this gap, we present SADU, a benchmark for Software Architecture Diagram Understanding that evaluates VLMs on architecture diagrams as structured software engineering artifacts rather than generic images. SADU contains 154 carefully curated diagrams spanning behavioral, structural, and ER diagrams, paired with structured annotations and 2,431 question-answer tasks covering counting and retrieval reasoning. We evaluate 11 state-of-the-art VLMs from the Gemini, Claude, GPT, and Qwen families.
Our results show that software architecture diagram understanding remains challenging for current models: the best-performing model gemini-3-flash-preview achieves only 70.18\% accuracy, while gpt-4o-mini only achieves 17.77\% accuracy. The results further reveal the weaknesses in diagram reasoning and visual relation grounding, highlighting a gap between current VLMs and the needs of design-stage software engineering. SADU provides a foundation for future research on diagram-aware AI systems and more faithful AI-assisted software engineering workflows.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources
Authors:
Shuaike Shen,
Wenduo Cheng,
Mingqian Ma,
Alistair Turcan,
Martin Jinye Zhang,
Jian Ma
Abstract:
Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific age…
▽ More
Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
SPURS: Evidence for Clumpy Neutral Envelopes and Ionized IGM Surrounding Little Red Dots in Abell 2744 from Ultra-Deep Rest-UV Spectroscopy
Authors:
Mengtao Tang,
Daniel P. Stark,
Charlotte A. Mason,
Zuyi Chen,
Harley Katz,
Max Gronke,
Lukas J. Furtak,
Seok-Jun Chang,
Jorryt Matthee,
Lily Whitler,
Adi Zitrin,
Ryan Endsley,
Viola Gelli,
Tamojeet Roychowdhury,
Peter Senchyna,
Michael W. Topping,
Meng Zhang
Abstract:
Rest-frame ultraviolet (UV) spectra of Little Red Dots (LRDs) often show Ly$α$ emission. Along with broad Balmer emission, LRDs are expected to produce broad Ly$α$ emission. However, the large column density of neutral gas invoked to explain the Balmer break should significantly redshift and further broaden the Ly$α$ line, making it challenging to detect without sensitive, moderate-resolution spec…
▽ More
Rest-frame ultraviolet (UV) spectra of Little Red Dots (LRDs) often show Ly$α$ emission. Along with broad Balmer emission, LRDs are expected to produce broad Ly$α$ emission. However, the large column density of neutral gas invoked to explain the Balmer break should significantly redshift and further broaden the Ly$α$ line, making it challenging to detect without sensitive, moderate-resolution spectra. We present ultra-deep (29 hours) G140M JWST/NIRSpec observations covering the rest-UV of two LRDs in Abell2744 from the SPURS Cycle 4 Large Program. One of our targets is Abell2744-QSO1, a gravitationally-lensed LRD at $z=7.04$ with faint UV emission (M$_{\rm UV}=-16.9$), and the other source (UNCOVER-2476) is newly-confirmed at $z=4.02$ with a very bright UV continuum (M$_{\rm UV}=-19.6$). We find that Abell2744-QSO1 has a broad Ly$α$ profile, along with narrow CIV, FeII$\lambda1786$, and OI$\lambda1302$ emission. The Ly$α$ profile suggests an origin similar to the broad H$α$, but the line is considerably less redshifted than expected from existing dense gas models. We show that the line profile can be explained if the dense neutral gas is clumpy, allowing Ly$α$ to escape by scattering off of the clump surfaces. We find that UNCOVER-2476 has narrow [NeIV] emission, indicating either a hard radiation field or shocks. We confirm two close neighbors with Ly$α$ emission around Abell2744-QSO1, indicating it traces a dense environment that may have ionized its surrounding IGM. We suggest that LRDs may preferentially trace bubbles carved by their dense environments, contributing to the prevalence of Ly$α$ in the population.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing
Authors:
Huiyao Chen,
Ruimeng Liu,
Yan Luo,
Jiawen Zhang,
Meishan Zhang,
Baotian Hu,
Min Zhang
Abstract:
The intersection of artificial intelligence and psychological science has experienced remarkable growth, with annual publications expanding from 859 papers in 2000 to 29,979 by 2025. However, this rapid evolution has created methodological fragmentation where similar computational techniques are independently developed across isolated psychological domains. This survey introduces the first systema…
▽ More
The intersection of artificial intelligence and psychological science has experienced remarkable growth, with annual publications expanding from 859 papers in 2000 to 29,979 by 2025. However, this rapid evolution has created methodological fragmentation where similar computational techniques are independently developed across isolated psychological domains. This survey introduces the first systematic taxonomy that organizes AI-driven psychology tasks by computational processing patterns rather than application domains, categorizing them into four fundamental types: classification, regression, structured relational, and generative interactive tasks. Through analysis of over 300 representative works spanning the pre-trained model era and large language model era, we examine how computational approaches evolved from task-specific feature engineering to transfer learning and few-shot adaptation. We provide systematic coverage of datasets, evaluation metrics, and benchmarks while addressing fundamental challenges including interpretability, label uncertainty, privacy constraints, and cross-cultural validity. This computational perspective reveals transferable methodological patterns previously obscured by domain-centric organization, enabling systematic knowledge transfer and accelerated progress in computational psychology.
△ Less
Submitted 12 March, 2026;
originally announced April 2026.
-
Temporal structure of the language hierarchy within small cortical patches
Authors:
Julien Gadonneix,
Mingfang Zhang,
Jérémy Rapin,
Linnea Evanson,
Pierre Bourdillon,
Jean-Rémi King
Abstract:
Speech production requires the rapid coordination of a complex hierarchy of linguistic units, transforming a semantic representation into a precise sequence of articulatory movements. To unravel the neural mechanisms underlying this feat, we leverage recordings from eight 3.2 x 3.2 mm 64-microelectrode arrays implanted in the motor cortex and inferior frontal gyrus of two patients tasked to produc…
▽ More
Speech production requires the rapid coordination of a complex hierarchy of linguistic units, transforming a semantic representation into a precise sequence of articulatory movements. To unravel the neural mechanisms underlying this feat, we leverage recordings from eight 3.2 x 3.2 mm 64-microelectrode arrays implanted in the motor cortex and inferior frontal gyrus of two patients tasked to produce twenty thousand sentences. We show that a hierarchy of linguistic features are robustly encoded in most of these small cortical patches. Contrary to our expectations, instead of a clear macroscopic organization between patches, we observe a multiplexing of phonetic, syllabic and lexical representations within each cortical patch. Critically, this coding scheme dynamically changes over time to allow successive phonemes, syllables and words to be simultaneously represented without interference. Overall, these results, reminiscent of position encoding in transformers, show how small cortical patches organize the unfolding of the speech hierarchy during language production.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
High-energy electronic excitations in La3Ni2O7 by time-resolved optical spectroscopy
Authors:
Junzhi Zhu,
Mengwu Huo,
Yubin Wang,
Yuxin Zhai,
Lili Hu,
Haiyun Huang,
Xiu Zhang,
Baixu Xiang,
Mengdi Zhang,
Yusong Gan,
Zhiyuan An,
Meng Wang,
Qihua Xiong,
Haiyun Liu
Abstract:
Recently, high-temperature superconductivity has been established in bilayer La3Ni2O7, which exhibits a density-wave (DW) transition at ~ 150 K under ambient pressure. The DW order is believed to be linked to superconductivity, as it is suppressed upon the emergence of superconductivity at high pressures. Here, we explore the ultrafast dynamics of high-energy electronic excitations from 10 K to ro…
▽ More
Recently, high-temperature superconductivity has been established in bilayer La3Ni2O7, which exhibits a density-wave (DW) transition at ~ 150 K under ambient pressure. The DW order is believed to be linked to superconductivity, as it is suppressed upon the emergence of superconductivity at high pressures. Here, we explore the ultrafast dynamics of high-energy electronic excitations from 10 K to room temperature under ambient pressure using time-resolved optical spectroscopy. Two high-energy electronic excitations at ~1.8 and ~ 2.4 eV, arising from distinct interband transitions, are identified. They exhibit different DW gaps of approximately 54 and 67 meV, respectively, along with relaxation dynamics that can be well described by the Rothwarf-Taylor model. In addition, we observe four coherent Raman-active phonon modes that exhibit distinct coupling with different electronic excitations. The phonon softening with increasing temperature can be well described between ~100 K and room temperature by a semi-quantitative model, which includes thermal expansion and anharmonic phonon-phonon coupling. At cryogenic temperatures, deviations from the measured temperature-dependent phonon frequencies and the model fits suggest an additional contribution from electron-phonon coupling. Our study provides direct evidence of the complex gap structure and phonon dynamics in this material, offering critical insights into the DW mechanism and many-body effects.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs
Authors:
Lik Tung Fu,
Jie Zhou,
Shaokai Ren,
Mengli Zhang,
Jia Xiong,
Hugo Jiang,
Nan Guan,
Xi Wang,
Jun Yang
Abstract:
Functional verification consumes over 50% of the IC development lifecycle, where SystemVerilog Assertions (SVAs) are indispensable for formal property verification and enhanced simulation-based debugging. However, manual SVA authoring is labor-intensive and error-prone. While Large Language Models (LLMs) show promise, their direct deployment is hindered by low functional accuracy and a severe scar…
▽ More
Functional verification consumes over 50% of the IC development lifecycle, where SystemVerilog Assertions (SVAs) are indispensable for formal property verification and enhanced simulation-based debugging. However, manual SVA authoring is labor-intensive and error-prone. While Large Language Models (LLMs) show promise, their direct deployment is hindered by low functional accuracy and a severe scarcity of domain-specific data. To address these challenges, we introduce ChatSVA, an end-to-end SVA generation system built upon a multi-agent framework. At its core, the AgentBridge platform enables this multi-agent approach by systematically generating high-purity datasets, overcoming the data scarcity inherent to few-shot scenarios. Evaluated on 24 RTL designs, ChatSVA achieves 98.66% syntax and 96.12% functional pass rates, generating 139.5 SVAs per design with 82.50% function coverage. This represents a 33.3 percentage point improvement in functional correctness and an over 11x enhancement in function coverage compared to the previous state-of-the-art (SOTA). ChatSVA not only sets a new SOTA in automated SVA generation but also establishes a robust framework for solving long-chain reasoning problems in few-shot, domain-specific scenarios. An online service has been publicly released at https://www.nctieda.com/CHATDV.html.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
OMNI-PoseX: A Fast Vision Model for 6D Object Pose Estimation in Embodied Tasks
Authors:
Michael Zhang,
Wei Ying,
Fangwen Chen,
Shifeng Bai,
Hanwen Kang
Abstract:
Accurate 6D object pose estimation is a fundamental capability for embodied agents, yet remains highly challenging in open-world environments. Many existing methods often rely on closed-set assumptions or geometry-agnostic regression schemes, limiting their generalization, stability, and real-time applicability in robotic systems. We present OMNI-PoseX, a vision foundation model that introduces a…
▽ More
Accurate 6D object pose estimation is a fundamental capability for embodied agents, yet remains highly challenging in open-world environments. Many existing methods often rely on closed-set assumptions or geometry-agnostic regression schemes, limiting their generalization, stability, and real-time applicability in robotic systems. We present OMNI-PoseX, a vision foundation model that introduces a novel network architecture unifying open-vocabulary perception with an SO(3)-aware reflected flow matching pose predictor. The architecture decouples object-level understanding from geometry-consistent rotation inference, and employs a lightweight multi-modal fusion strategy that conditions rotation-sensitive geometric features on compact semantic embeddings, enabling efficient and stable 6D pose estimation. To enhance robustness and generalization, the model is trained on large-scale 6D pose datasets, leveraging broad object diversity, viewpoint variation, and scene complexity to build a scalable open-world pose backbone. Comprehensive evaluations across benchmark pose estimation, ablation studies, zero-shot generalization, and system-level robotic grasping integration demonstrate the effectiveness of OMNI-PoseX. The OMNI-PoseX achieves SOTA pose accuracy and real-time efficiency, while delivering geometrically consistent predictions that enable reliable grasping of diverse, previously unseen objects.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
PhDLspec: physical-prior embedded deep learning method for spectroscopic determination of stellar labels in high-dimensional parameter space
Authors:
Tianmin Wu,
Maosheng Xiang,
Jianrong Shi,
Meng Zhang,
Lanya Mou,
Hong-Liang Yan,
A-Li Luo
Abstract:
Unlocking the full physical information encoded in low-resolution spectra poses a significant challenge for astronomical survey analysis. Such a task demands modeling spectra and optimizing astrophysical parameters in high-dimensional space, as a consequence of line blending. Here we present PhDLspec -- a deep learning framework embedded with physical priors for stellar spectra modeling and analys…
▽ More
Unlocking the full physical information encoded in low-resolution spectra poses a significant challenge for astronomical survey analysis. Such a task demands modeling spectra and optimizing astrophysical parameters in high-dimensional space, as a consequence of line blending. Here we present PhDLspec -- a deep learning framework embedded with physical priors for stellar spectra modeling and analysis. By imposing differential spectra derived from ab initio stellar atmospheric model calculation on a transformer framework, PhDLspec can rigorously and precisely model stellar spectra by simultaneously taking into account more than 30 physical parameters, at a computational speed hundreds of times faster than ab initio model calculation. With such a flexible stellar modeling approach, PhDLspec can effectively derive ~30 stellar labels from a low-resolution spectrum using affordable optimization techniques. Application to LAMOST spectra (R~1800) yields stellar elemental abundances in good agreement with high-resolution spectroscopic surveys, following essential calibrations to correct systematic biases in elemental abundance estimates using wide binaries and reference high-resolution datasets. We provide a catalog of 25 elemental abundances for 116,611 subgiant stars with precise age estimates. The successful application of PhDLspec to LAMOST spectra for high-dimensional parameter determination sheds light on similar challenges faced by other surveys and disciplines.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
Beyond Semantic Manipulation: Token-Space Attacks on Reward Models
Authors:
Yuheng Zhang,
Mingyue Huo,
Minghao Zhu,
Mengxue Zhang,
Nan Jiang
Abstract:
Reward models (RMs) are widely used as optimization targets in reinforcement learning from human feedback (RLHF), yet they remain vulnerable to reward hacking. Existing attacks mainly operate within the semantic space, constructing human-readable adversarial outputs that exploit RM biases. In this work, we introduce a fundamentally different paradigm: Token Mapping Perturbation Attack (TOMPA), a f…
▽ More
Reward models (RMs) are widely used as optimization targets in reinforcement learning from human feedback (RLHF), yet they remain vulnerable to reward hacking. Existing attacks mainly operate within the semantic space, constructing human-readable adversarial outputs that exploit RM biases. In this work, we introduce a fundamentally different paradigm: Token Mapping Perturbation Attack (TOMPA), a framework that performs adversarial optimization directly in token space. By bypassing the standard decode-re-tokenize interface between the policy and the reward model, TOMPA enables the attack policy to optimize over raw token sequences rather than coherent natural language. Using only black-box scalar feedback, TOMPA automatically discovers non-linguistic token patterns that elicit extremely high rewards across multiple state-of-the-art RMs. Specifically, when targeting Skywork-Reward-V2-Llama-3.1-8B, TOMPA nearly doubles the reward of GPT-5 reference answers and outperforms them on 98.0% of prompts. Despite these high scores, the generated outputs degenerate into nonsensical text, revealing that RMs can be systematically exploited beyond the semantic regime and exposing a critical vulnerability in current RLHF pipelines.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate
Authors:
Jiaqing Wu,
Tong Wu,
Manqing Zhang,
Yunwei Dong,
Bo Shen
Abstract:
Automated Program Repair (APR) struggles with complex logic errors and silent failures. Current LLM-based APR methods are mostly static, relying on source code and basic test outputs, which fail to accurately capture complex runtime behaviors and dynamic data dependencies. While incorporating runtime evidence like execution traces exposes concrete state transitions, a single LLM interpreting this…
▽ More
Automated Program Repair (APR) struggles with complex logic errors and silent failures. Current LLM-based APR methods are mostly static, relying on source code and basic test outputs, which fail to accurately capture complex runtime behaviors and dynamic data dependencies. While incorporating runtime evidence like execution traces exposes concrete state transitions, a single LLM interpreting this in isolation often overfits to specific hypotheses, producing patches that satisfy tests by coincidence rather than correct logic. Therefore, runtime evidence should act as objective constraints rather than mere additional input. We propose TraceRepair, a multi-agent framework that leverages runtime facts as shared constraints for patch validation. A probe agent captures execution snapshots of critical variables to form an objective repair basis. Meanwhile, a committee of specialized agents cross-verifies candidate patches to expose inconsistencies and iteratively refine them. Evaluated on the Defects4J benchmark, TraceRepair correctly fixes 392 defects, substantially outperforming existing LLM-based approaches. Extensive experiments demonstrate improved efficiency and strong generalization on a newly constructed dataset of recent bugs, confirming that performance gains arise from dynamic reasoning rather than memorization.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers
Authors:
Xiang Yang,
Feifei Li,
Mi Zhang,
Geng Hong,
Xiaoyu You,
Min Yang
Abstract:
Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net…
▽ More
Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can effectively modify some specific concepts in the generated images. Motivated by these insights, we propose SafeRoPE, a lightweight and fine-grained safe generation framework for MMDiT. Specifically, SafeRoPE first constructs head-wise unsafe subspaces by decomposing unsafe embeddings within safety-critical heads, and computes a Latent Risk Score (LRS) for each input vector via projection onto these subspaces. We then introduce head-wise RoPE perturbations that can suppress unsafe semantics without degrading benign content or image quality. SafeRoPE combines both head-wise LRS and RoPE perturbations to perform risk-specific head-wise rotation on query and key vector embeddings, enabling precise suppression of unsafe outputs while maintaining generation fidelity. Extensive experiments demonstrate that SafeRoPE achieves SOTA performance in balancing effective harmful content mitigation and utility preservation for safe generation of MMDiT. Codes are available at https://github.com/deng12yx/SafeRoPE.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
Authors:
Mengxian Lyu,
Cheng Peng,
Ziyi Chen,
Mengyuan Zhang,
Jieting Li Lu,
Yonghui Wu
Abstract:
Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a task-specific medical dataset, a critical challenge in adopting general-purpose LLMs for clinical applications. This study presents a model merging f…
▽ More
Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a task-specific medical dataset, a critical challenge in adopting general-purpose LLMs for clinical applications. This study presents a model merging framework to efficiently adapt general-purpose LLMs to the medical domain by countering this forgetting issue. By merging a clinical foundation model (GatorTronLlama) with a general instruct model (Llama-3.1-8B-Instruct) via interpolation-based merge methods, we seek to derive a domain-adapted model with strong performance on clinical tasks while retaining instruction-following ability. Comprehensive evaluation across medical benchmarks and five clinical generation tasks (e.g., radiology and discharge summarization) shows that merged models can effectively mitigate catastrophic forgetting, preserve clinical domain expertise, and retain instruction-following ability. In addition, our model merging strategies demonstrate training efficiency, achieving performance on par with fully fine-tuned baselines under severely constrained supervision (e.g., 64-shot vs. 256-shot). Consequently, weight-space merging constitutes a highly scalable solution for adapting open-source LLMs to clinical applications, facilitating broader deployment in resource-constrained healthcare environments.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
LightGuard: Transparent WiFi Security via Physical-Layer LiFi Key Bootstrapping
Authors:
Shiqi Xu,
Yuyang Du,
Mingyue Zhang,
Hongwei Cui,
Soung Chang Liew
Abstract:
WiFi is inherently vulnerable to eavesdropping because RF signals may penetrate many physical boundaries, such as walls and floors. LiFi, by contrast, is an optical method confined to line-of-sight and blocked by opaque surfaces. We present LightGuard, a dual-link architecture built on this insight: cryptographic key establishment can be offloaded from WiFi to a physically confined LiFi channel to…
▽ More
WiFi is inherently vulnerable to eavesdropping because RF signals may penetrate many physical boundaries, such as walls and floors. LiFi, by contrast, is an optical method confined to line-of-sight and blocked by opaque surfaces. We present LightGuard, a dual-link architecture built on this insight: cryptographic key establishment can be offloaded from WiFi to a physically confined LiFi channel to mitigate the risk of key exposure over RF. LightGuard derives session keys over a LiFi link and installs them on the WiFi interface, ensuring cryptographic material never traverses the open RF medium. A prototype with off-the-shelf WiFi NICs and our LiFi transceiver frontend validates the design.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Agentic Tool Use in Large Language Models
Authors:
Jinchao Hu,
Meizhi Zhong,
Kehai Chen,
Xuefeng Bai,
Min Zhang
Abstract:
Large language models are increasingly being deployed as autonomous agents yet their real world effectiveness depends on reliable tools for information retrieval, computation and external action. Existing studies remain fragmented across tasks, tool types, and training settings, lacking a unified view of how tool-use methods differ and evolve. This paper organizes the literature into three paradig…
▽ More
Large language models are increasingly being deployed as autonomous agents yet their real world effectiveness depends on reliable tools for information retrieval, computation and external action. Existing studies remain fragmented across tasks, tool types, and training settings, lacking a unified view of how tool-use methods differ and evolve. This paper organizes the literature into three paradigms: prompting as plug-and-play, supervised tool learning and reward-driven tool policy learning, analyzes their methods, strengths and failure modes, reviews the evaluation landscape and highlights key challenges, aiming to address this fragmentation and provide a more structured evolutionary view of agentic tool use.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Enhancing REST API Fuzzing with Access Policy Violation Checks and Injection Attacks
Authors:
Omur Sahin,
Man Zhang,
Andrea Arcuri
Abstract:
Due to their widespread use in industry, several techniques have been proposed in the literature to fuzz REST APIs. Existing fuzzers for REST APIs have been focusing on detecting crashes (e.g., 500 HTTP server error status code). However, security vulnerabilities can have major drastic consequences on existing cloud infrastructures.
In this paper, we propose a series of novel automated oracles a…
▽ More
Due to their widespread use in industry, several techniques have been proposed in the literature to fuzz REST APIs. Existing fuzzers for REST APIs have been focusing on detecting crashes (e.g., 500 HTTP server error status code). However, security vulnerabilities can have major drastic consequences on existing cloud infrastructures.
In this paper, we propose a series of novel automated oracles aimed at detecting violations of access policies in REST APIs, as well as executing traditional attacks such as SQL Injection and XSS. These novel automated oracles can be integrated into existing fuzzers, in which, once the fuzzing session is completed, a ``security testing'' phase is executed to verify these oracles. When a security fault is detected, as output our technique is able to general executable test cases in different formats, like Java, Kotlin, Python and JavaScript test suites.
Our novel techniques are integrated as an extension of EvoMaster, a state-of-the-art open-source fuzzer for REST APIs. Experiments are carried out on 9 artificial examples, 8 vulnerable-by-design REST APIs with black-box testing, and 36 REST APIs from the WFD corpus with white-box testing, for a total of 52 distinct APIs. Results show that our novel oracles and their automated integration in a fuzzing process can lead to detect security issues in several of these APIs.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving
Authors:
Feng Ren,
Ruoyu Qin,
Teng Ma,
Shangming Cai,
Zheng Liu,
Chao Lei,
Dejiang Zhu,
Ke Yang,
Zheming Li,
Jialei Cui,
Weixiao Huang,
Yikai Zhao,
Yineng Zhang,
Hao Wu,
Xiang Gao,
Yuhao Fu,
Jinlei Jiang,
Yongwei Wu,
Mingxing Zhang
Abstract:
Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative,…
▽ More
Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative, statically bound path selection. This rigidity forces engines to rely on state-blind striping that ignores congestion signals, creating communication silos, wasting multi-rail bandwidth due to head-of-line blocking, and leading to operational fragility where routine faults require manual intervention. We present TENT, a data-movement engine that decouples transfer intent from physical execution. Instead of locking workloads to fixed backends, TENT unifies heterogeneous interconnects into a single dynamic resource pool. Applications simply declare transfer intents, while TENT dynamically decomposes elephant flows into fine-grained slices and "sprays" them across links based on instantaneous link quality. This telemetry-driven orchestration eliminates head-of-line blocking and enables transparent, sub-50 ms self-healing by rerouting slices around failures without application logic. TENT serves as the production data plane for LLM inference and RL pipelines at multiple industrial sites. Our evaluation on H800 HGX clusters shows that TENT outperforms state-of-the-art baselines, including Mooncake TE, NIXL, and UCCL. In LLM inference with SGLang HiCache, TENT achieves up to 1.36x higher throughput and 26% lower P90 TTFT than Mooncake TE. In RL pipelines, TENT accelerates parameter updates in Moonshot Checkpoint Engine by 20-26%.
△ Less
Submitted 31 March, 2026;
originally announced April 2026.
-
First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (683 additional authors not shown)
Abstract:
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and el…
▽ More
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and electromagnetic amplitudes of the $ψ(2S)$ resonance, a fundamental parameter in charmonium physics, based on the assumption that the relative phase between the electromagnetic amplitude of the $ψ(2S)$ resonance and the continuum is zero. Two distinct solutions for the branching fraction $\mathcal{B}$ of $ψ(2S)\to K^{+}K^{-}$ are observed: a constructive interference solution with $\mathcal{B}=(7.49\pm0.41)\times10^{-5}$ and $Φ=(110.1 \pm6.7)^\circ$, and a destructive interference solution with $\mathcal{B}=(10.94\pm0.48)\times10^{-5}$ and $Φ=(-106.8\pm5.7)^\circ$. A significant correlation between $Φ$ and $\mathcal{B}$ is established, demonstrating that interference effects must be taken into account in the $ψ(2S)$ branching fraction measurements. Additionally, the first results for both the $ψ(2S)$ strong form factor, which characterizes the strong coupling between $ψ(2S)$ and $K^{+}K^{-}$, and the energy-dependent electromagnetic form factor of the charged kaon in this energy region are here reported.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Instabilities in flow through and around a circular array of cylinders
Authors:
Huaibao Zhang,
Yongliang Yang,
Guangxue Wang,
Mengqi Zhang
Abstract:
This paper presents results of two-dimensional direct numerical simulations (DNS) and global linear stability analyses (based on mean flow and base flow) of a viscous incompressible flow past a circular array of cylinders with six-fold rotational symmetry. Six cylinder arrays, with varied patch density $φ= N_c (d/D)^2$ (with $N_c$ cylinders of diameter $d$ within a patch of diameter $D$) is invest…
▽ More
This paper presents results of two-dimensional direct numerical simulations (DNS) and global linear stability analyses (based on mean flow and base flow) of a viscous incompressible flow past a circular array of cylinders with six-fold rotational symmetry. Six cylinder arrays, with varied patch density $φ= N_c (d/D)^2$ (with $N_c$ cylinders of diameter $d$ within a patch of diameter $D$) is investigated by adjusting the $N_c$ and its arrangement. The simulations cover a wide parameter space, with $φ$ ranging from $0.043$ to $0.315$, and the free-stream flow Reynolds numbers ($Re_D = U_{\infty} D / ν= < 300$ based on $D$ and the uniform incoming velocity $U_{\infty}$ and the kinematic viscosity $ν$). We focus on the onset of vortex shedding with variant $φ$, since the onset of global instabilities in such arrays has not been discussed earlier in the literature. For the patch diameters and solid volume fractions considered here,three distinct regimes are identified: (I) a low-$φ$ regime where cylinders behave nearly independently, the flow is stable, forming steady wake without vortex street; (II) an intermediate-$φ$ regime where $Re_c$ varies logarithmically with $φ$, resembling a porous medium; and (III) a high-$φ$ regime where $Re_c$ approaches that of a solid cylinder ($φ= 1$).
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
Authors:
Shuang Chen,
Quanxin Shou,
Hangting Chen,
Yucheng Zhou,
Kaituo Feng,
Wenbo Hu,
Yi-Fan Zhang,
Yunlong Lin,
Wenxuan Huang,
Mingyang Song,
Dasen Dai,
Bolin Jiang,
Manyuan Zhang,
Shi-Xue Zhang,
Zhengkai Jiang,
Lucas Wang,
Zhao Zhong,
Yu Cheng,
Nanyun Peng
Abstract:
Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and knowledge-intensive concepts. Inspired by the broad success of agents on real-worl…
▽ More
Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and knowledge-intensive concepts. Inspired by the broad success of agents on real-world tasks, we explore agentic modeling to address this limitation. Specifically, we present Unify-Agent, a unified multimodal agent for world-grounded image synthesis, which reframes image generation as an agentic pipeline consisting of prompt understanding, multimodal evidence searching, grounded recaptioning, and final synthesis. To train our model, we construct a tailored multimodal data pipeline and curate 143K high-quality agent trajectories for world-grounded image synthesis, enabling effective supervision over the full agentic generation process. We further introduce FactIP, a benchmark covering 12 categories of culturally significant and long-tail factual concepts that explicitly requires external knowledge grounding. Extensive experiments show that our proposed Unify-Agent substantially improves over its base unified model across diverse benchmarks and real world generation tasks, while approaching the world knowledge capabilities of the strongest closed-source models. As an early exploration of agent-based modeling for world-grounded image synthesis, our work highlights the value of tightly coupling reasoning, searching, and generation for reliable open-world agentic image synthesis.
△ Less
Submitted 1 April, 2026; v1 submitted 31 March, 2026;
originally announced March 2026.
-
Style-Instructed Mask-Free Virtual Try On
Authors:
Mengqi Zhang,
Qi Li,
Mehmet Saygin Seyfioglu,
Karim Bouyarmane
Abstract:
Virtual Try-On is a promising research area with broad applications in e-commerce and everyday life, enabling users to visualize garments on themselves or others before purchase. Most existing methods depend on predefined or user-specified masks to guide garment placement, but their performance is highly sensitive to mask quality, often causing misalignment or artifacts, and introduces redundant s…
▽ More
Virtual Try-On is a promising research area with broad applications in e-commerce and everyday life, enabling users to visualize garments on themselves or others before purchase. Most existing methods depend on predefined or user-specified masks to guide garment placement, but their performance is highly sensitive to mask quality, often causing misalignment or artifacts, and introduces redundant steps for users. To overcome these limitations, we propose a mask-free virtual try-on framework that requires only minimal modifications to the underlying architecture while remaining compatible with common diffusion-based pipelines. To address the increased ambiguity in the absence of masks, we integrate an attention-based guidance mechanism that explicitly directs the model to focus on the target garment region and improves correspondence between the garment and the person. Additionally, we incorporate instruction prompts, allowing users to flexibly control garment categories and wearing styles, addressing the underutilization of prompts in prior work and improving interaction flexibility. Both qualitative and quantitative evaluations across multiple datasets demonstrate that our approach consistently outperforms existing methods, producing more accurate, robust, and user-friendly try-on results.
△ Less
Submitted 4 February, 2026;
originally announced March 2026.
-
Hybrid Quantum-Classical Spatiotemporal Forecasting for 3D Cloud Fields
Authors:
Fu Wang,
Qifeng Lu,
Xinyu Long,
Meng Zhang,
Xiaofei Yang,
Weijia Cao,
Xiaowen Chu
Abstract:
Accurate forecasting of three-dimensional (3D) cloud fields is important for atmospheric analysis and short-range numerical weather prediction, yet it remains challenging because cloud evolution involves cross-layer interactions, nonlocal dependencies, and multiscale spatiotemporal dynamics. Existing spatiotemporal prediction models based on convolutions, recurrence, or attention often rely on loc…
▽ More
Accurate forecasting of three-dimensional (3D) cloud fields is important for atmospheric analysis and short-range numerical weather prediction, yet it remains challenging because cloud evolution involves cross-layer interactions, nonlocal dependencies, and multiscale spatiotemporal dynamics. Existing spatiotemporal prediction models based on convolutions, recurrence, or attention often rely on locality-biased representations and therefore struggle to preserve fine cloud structures in volumetric forecasting tasks. To address this issue, we propose QENO, a hybrid quantum-inspired spatiotemporal forecasting framework for 3D cloud fields. The proposed architecture consists of four components: a classical spatiotemporal encoder for compact latent representation, a topology-aware quantum enhancement block for modeling nonlocal couplings in latent space, a dynamic fusion temporal unit for integrating measurement-derived quantum features with recurrent memory, and a decoder for reconstructing future cloud volumes. Experiments on CMA-MESO 3D cloud fields show that QENO consistently outperforms representative baselines, including ConvLSTM, PredRNN++, Earthformer, TAU, and SimVP variants, in terms of MSE, MAE, RMSE, SSIM, and threshold-based detection metrics. In particular, QENO achieves an MSE of 0.2038, an RMSE of 0.4514, and an SSIM of 0.6291, while also maintaining a compact parameter budget. These results indicate that topology-aware hybrid quantum-classical feature modeling is a promising direction for 3D cloud structure forecasting and atmospheric Earth observation data analysis.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Authors:
Kaituo Feng,
Manyuan Zhang,
Shuang Chen,
Yunlong Lin,
Kaixuan Fan,
Yilei Jiang,
Hongyu Li,
Dian Zheng,
Chenyang Wang,
Xiangyu Yue
Abstract:
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generat…
▽ More
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Curriculum-Guided Myocardial Scar Segmentation for Ischemic and Non-ischemic Cardiomyopathy
Authors:
Nivetha Jayakumar,
Jonathan Pan,
Shuo Wang,
Bishow Paudel,
Nisha Hosadurg,
Cristiane C. Singulane,
Sivam Bhatt,
Amit R. Patel,
Miaomiao Zhang
Abstract:
Identification and quantification of myocardial scar is important for diagnosis and prognosis of cardiovascular diseases. However, reliable scar segmentation from Late Gadolinium Enhancement Cardiac Magnetic Resonance (LGE-CMR) images remains a challenge due to variations in contrast enhancement across patients, suboptimal imaging conditions such as post contrast washout, and inconsistencies in gr…
▽ More
Identification and quantification of myocardial scar is important for diagnosis and prognosis of cardiovascular diseases. However, reliable scar segmentation from Late Gadolinium Enhancement Cardiac Magnetic Resonance (LGE-CMR) images remains a challenge due to variations in contrast enhancement across patients, suboptimal imaging conditions such as post contrast washout, and inconsistencies in ground truth annotations on diffuse scars caused by inter observer variability. In this work, we propose a curriculum learning-based framework designed to improve segmentation performance under these challenging conditions. The method introduces a progressive training strategy that guides the model from high-confidence, clearly defined scar regions to low confidence or visually ambiguous samples with limited scar burden. By structuring the learning process in this manner, the network develops robustness to uncertain labels and subtle scar appearances that are often underrepresented in conventional training pipelines. Experimental results show that the proposed approach enhances segmentation accuracy and consistency, particularly for cases with minimal or diffuse scar, outperforming standard training baselines. This strategy provides a principled way to leverage imperfect data for improved myocardial scar quantification in clinical applications. Our code is publicly available on GitHub.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
Authors:
Yufei Xu,
Fanxu Meng,
Fan Jiang,
Yuxuan Wang,
Ruijie Zhou,
Zhaohui Wang,
Jiexi Wu,
Zhixin Pan,
Xiaojuan Tang,
Wenjie Pei,
Tongxuan Liu,
Di Yin,
Xing Sun,
Muhan Zhang
Abstract:
Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical key for each query through a lightweight indexer, then computing attention only on the selected subset. While the downstream sparse attention itself scales favorably, the indexer must still scan the entire prefix for every query, introducing an per…
▽ More
Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical key for each query through a lightweight indexer, then computing attention only on the selected subset. While the downstream sparse attention itself scales favorably, the indexer must still scan the entire prefix for every query, introducing an per-layer bottleneck that grows prohibitively with context length. We propose HISA (Hierarchical Indexed Sparse Attention), a plug-and-play replacement for the indexer that rewrites the search path from a flat token scan into a two-stage hierarchical procedure: (1) a block-level coarse filtering stage that scores pooled block representations to discard irrelevant regions, followed by (2) a token-level refinement stage that applies the original indexer exclusively within the retained candidate blocks. HISA preserves the identical token-level top-sparse pattern consumed by the downstream Sparse MLA operator and requires no additional training. On kernel-level benchmarks, HISA achieves up to speedup at 64K context. On Needle-in-a-Haystack and LongBench, we directly replace the indexer in DeepSeek-V3.2 and GLM-5 with our HISA indexer, without any finetuning. HISA closely matches the original DSA in quality, while substantially outperforming block-sparse baselines.
△ Less
Submitted 6 April, 2026; v1 submitted 30 March, 2026;
originally announced March 2026.
-
Detecting and Mitigating Flakiness in REST API Fuzzing
Authors:
Man Zhang,
Chongyang Shen,
Andrea Arcuri,
Tao Yue
Abstract:
Test flakiness is a common problem in industry, which hinders the reliability of automated build and testing workflows. Most existing research on test flakiness has primarily focused on unit and small-scale integration tests. In contrast, flakiness in system-level testing such as REST APIs are comparatively under-explored. A large body of literature has been dedicated to the topic of fuzzing REST…
▽ More
Test flakiness is a common problem in industry, which hinders the reliability of automated build and testing workflows. Most existing research on test flakiness has primarily focused on unit and small-scale integration tests. In contrast, flakiness in system-level testing such as REST APIs are comparatively under-explored. A large body of literature has been dedicated to the topic of fuzzing REST APIs, whereas relatively little attention has been paid to detecting and possibly mitigating negative effects of flakiness in this context. To fill this major gap, in this paper, we study the flakiness of tests generated by one of the popularly applied REST API fuzzer in the literature, namely EvoMaster, conduct empirical studies with a corpus of 36 REST APIs to understand flakiness of REST APIs. Based on the results of the empirical studies, we categorize and analyze flakiness sources by inspecting near 3000 failing tests. Based on the understanding, we propose FlakyCatch to detect and mitigate flakiness in REST APIs and empirically evaluate its performance. Results show that FlakyCatch is effective in detecting and handling flakiness in tests generated by white-box and black-box fuzzers.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
A Foldable and Agile Soft Electromagnetic Robot for Multimodal Navigation in Confined and Unstructured Environments
Authors:
Zhihao Lv,
Xiaoyong Zhang,
Mengfan Zhang,
Xiaoyu Song,
Xingyue Liu,
Yide Liu,
Shaoxing Qu,
Guoyong Mao
Abstract:
Multimodal locomotion is crucial for an animal's adaptability in unstructured wild environments. Similarly, in the human gastrointestinal tract, characterized by viscoelastic mucus, complex rugae, and narrow sphincters like the cardia, multimodal locomotion is also essential for a small-scale soft robot to conduct tasks. Here, we introduce a small-scale compact, foldable, and robust soft electroma…
▽ More
Multimodal locomotion is crucial for an animal's adaptability in unstructured wild environments. Similarly, in the human gastrointestinal tract, characterized by viscoelastic mucus, complex rugae, and narrow sphincters like the cardia, multimodal locomotion is also essential for a small-scale soft robot to conduct tasks. Here, we introduce a small-scale compact, foldable, and robust soft electromagnetic robot (M-SEMR) with more than nine locomotion modes designed for such a scenario. Featuring a six-spoke elastomer body embedded with liquid metal channels and driven by Laplace forces under a static magnetic field, the M-SEMR is capable of rapid transitions (< 0.35 s) among different locomotion modes. It achieves exceptional agility, including high-speed rolling (818 mm/s, 26 BL/s), omnidirectional crawling, jumping, and swimming. Notably, the robot can fold to reduce its volume by 79%, enabling it to traverse confined spaces. We further validate its navigation capabilities on complex terrains, including discrete obstacles, viscoelastic gelatin surfaces, viscous fluids, and simulated biological tissues. This system offers a versatile strategy for developing high-mobility soft robots for future biomedical applications.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
The electricity system value of the local acceptance of onshore wind in Europe
Authors:
James Price,
Guillermo Valenzuela-Venegas,
Oskar Vågerö,
Marianne Zeyringer,
Monika Bucha,
Ruihong Chen,
Adrienne Etard,
Andrea N. Hahmann,
Alena Lohrmann,
Russell McKenna,
Christian Mikovits,
Evangelos Panos,
Meixi Zhang,
Luis Ramirez Camargo
Abstract:
The large-scale deployment of wind power is central to Europe`s energy transition but faces challenges due to its social and environmental impacts on communities. Here we assess how the tolerance of local stakeholders to such impacts translates across spatial scales to shape the cost and design of the continent`s net-zero electricity system using a soft-linked modelling framework. We find that low…
▽ More
The large-scale deployment of wind power is central to Europe`s energy transition but faces challenges due to its social and environmental impacts on communities. Here we assess how the tolerance of local stakeholders to such impacts translates across spatial scales to shape the cost and design of the continent`s net-zero electricity system using a soft-linked modelling framework. We find that lower impact tolerance can reduce the role of onshore wind in Europe reaching net-zero by up to 84% relative to a future where wind enjoys higher acceptance, with other low carbon sources needing to be scaled up to compensate. This translates into total European electricity system costs increasing by between 2-14% while some countries see costs escalating by 20% or more. Our results show that the local acceptance of onshore wind is a key structural driver of the system and highlight the system value of policies to promote it.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (722 additional authors not shown)
Abstract:
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be…
▽ More
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be $0.155\pm0.031_{\rm stat.}\pm0.012_{\rm syst.}$ Taking the world average of $\mathcal{B}(Λ_c^+\to Λπ^+η)$ as reference, the absolute branching fraction is calculated to be $\mathcal{B}(Λ_c^+\to nπ^+η)=(2.94\pm0.59_{\rm stat.}\pm0.23_{\rm syst.}\pm0.13_{\rm ref.})\times10^{-3}$. The intermediate process $Λ_c^+\to na_0(980)^+$ is also searched for in the $π^+η$ invariant mass spectrum. Since no significant signal is found, the upper limit on $\mathcal{B}(Λ_c^+\to na_0(980)^+)\times\mathcal{B}(a_0(980)^+\toπ^+η)$ is set to $8.4\times10^{-4}$ at 90\% confidence level. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish signals from prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Graphitic-C3N4/TiO2(B) S-scheme Heterojunctions for Efficient Photocatalytic H2 Production and Organic Pollution Degradation
Authors:
Xiaoyi Zhou,
Min Zhang,
Qiushi Wang,
Shiwen Du,
Xuedong Jing,
Zhenyi Zhang
Abstract:
Achieving both broad solar-spectrum absorption and strong redox capability is critical for semiconductor photocatalysts in environmental remediation and energy conversion. Herein, an S-scheme heterojunction photocatalyst is constructed by coupling TiO2(B) nanorods with g-C3N4 nanosheets. Its well-matched band structure extends light absorption from the UV to the visible region and enables efficien…
▽ More
Achieving both broad solar-spectrum absorption and strong redox capability is critical for semiconductor photocatalysts in environmental remediation and energy conversion. Herein, an S-scheme heterojunction photocatalyst is constructed by coupling TiO2(B) nanorods with g-C3N4 nanosheets. Its well-matched band structure extends light absorption from the UV to the visible region and enables efficient charge separation. Under simulated sunlight irradiation, the 40 wt% g-C3N4/TiO2(B) heterojunction delivers a H2 evolution rate of 1.98 mmol g-1 h-1 for water reduction with methanol as the sacrificial agent, which is 1.5 and 2.0 times higher than those of pure g-C3N4 and TiO2(B), respectively. When exposed to amoxicillin wastewater instead of methanol solution, the heterojunction degrades 98.2% of amoxicillin and produces 20.70 umol g-1 of H2 within 90 min. Moreover, the heterojunction shows excellent photodegradation activity toward various organic antibiotics and dyes, owing to the S-scheme charge separation mechanism. This work highlights the promising potential of S-scheme heterojunctions for photocatalytic H2 production coupled with organic wastewater treatment.
△ Less
Submitted 29 March, 2026;
originally announced March 2026.