Skip to main content

Showing 1–50 of 354 results for author: Tian, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.13694  [pdf, ps, other

    cs.AI

    Weight Patching: Toward Source-Level Mechanistic Localization in LLMs

    Authors: Chenghao Sun, Chengsheng Zhang, Guanzheng Qin, Rui Dai, Xinmei Tian

    Abstract: Mechanistic interpretability seeks to localize model behavior to the internal components that causally realize it. Prior work has advanced activation-space localization and causal tracing, but modules that appear important in activation space may merely aggregate or amplify upstream signals rather than encode the target capability in their own parameters. To address this gap, we propose Weight Pat… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: 36 pages. Submitted to IEEE for possible publication

  2. arXiv:2604.13001  [pdf, ps, other

    cs.RO

    XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios

    Authors: Junming Wang, Teng Pu, Wingmun Fung, Jindong Wang, Shanchang Wang, Yuan Deng, Shuyuan Wang, Ziwei Liu, Kunhao Pan, Ping Yang, Peng Zhai, Yuxin Liang, Xiaofan Li, Jiabi Sun, Renchao Xu, Xiaotian Tian, Pengfei Yan, Guoqiang Ye, Liang Li, Qian Wang, Ruyi Gan, Hao Wang

    Abstract: The acquisition of high-quality, action-aligned demonstration data remains a fundamental bottleneck in scaling foundation models for dexterous robot manipulation. Although robot-free human demonstrations (e.g., the UMI paradigm) offer a scalable alternative to traditional teleoperation, current systems are constrained by sub-optimal hardware ergonomics, open-loop workflows, and a lack of systemati… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: Technical Report

  3. arXiv:2604.10797  [pdf, ps, other

    cs.CV

    WBCBench 2026: A Challenge for Robust White Blood Cell Classification Under Class Imbalance

    Authors: Xin Tian, Xudong Ma, Tianqi Yang, Alin Achim, Bartłomiej W Papież, Phandee Watanaboonyongcharoen, Nantheera Anantrasirichai

    Abstract: We present WBCBench 2026, an ISBI challenge and benchmark for automated WBC classification designed to stress-test algorithms under three key difficulties: (i) severe class imbalance across 13 morphologically fine-grained WBC classes, (ii) strict patient-level separation between training, validation and test sets, and (iii) synthetic scanner- and setting-induced domain shift via controlled noise,… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: IEEE International Symposium on Biomedical Imaging (ISBI)

  4. arXiv:2604.07298  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Region-Graph Optimal Transport Routing for Mixture-of-Experts Whole-Slide Image Classification

    Authors: Xin Tian, Jiuliu Lu, Ephraim Tsalik, Bart Wanders, Colleen Knoth, Julian Knight

    Abstract: Multiple Instance Learning (MIL) is the dominant framework for gigapixel whole-slide image (WSI) classification in computational pathology. However, current MIL aggregators route all instances through a shared pathway, constraining their capacity to specialise across the pathological heterogeneity inherent in each slide. Mixture-of-Experts (MoE) methods offer a natural remedy by partitioning insta… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 10 pages, 2 figures, 2 tables

  5. arXiv:2604.07128  [pdf, ps, other

    cs.CV

    A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

    Authors: Chenhao Liu, Zelin Wen, Yan Tong, Junjie Zhu, Xinyu Tian, Yuchi Liu, Ashu Gupta, Syed M. S. Islam, Tom Gedeon, Yue Yao

    Abstract: Large-scale radiology data are critical for developing robust medical AI systems. However, sharing such data across hospitals remains heavily constrained by privacy concerns. Existing de-identification research in radiology mainly focus on removing identifiable information to enable compliant data release. Yet whether de-identified radiology data can still preserve sufficient utility for large-sca… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  6. arXiv:2604.05719  [pdf, ps, other

    cs.CR cs.AI cs.SE

    Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

    Authors: Jiaren Peng, Zeqin Li, Chang You, Yan Wang, Hanlin Sun, Xuan Tian, Shuqiao Zhang, Junyi Liu, Jianguo Zhao, Renyang Liu, Haoran Ou, Yuqiang Sun, Jiancheng Zhang, Yutong Jiao, Kunshu Song, Chao Zhang, Fan Shi, Hongda Sun, Rui Yan, Cheng Huang

    Abstract: The rapid advancement of Large Language Models (LLMs) has created new opportunities for Automated Penetration Testing (AutoPT), spawning numerous frameworks aimed at achieving end-to-end autonomous attacks. However, despite the proliferation of related studies, existing research generally lacks systematic architectural analysis and large-scale empirical comparisons under a unified benchmark. There… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  7. arXiv:2604.00479  [pdf, ps, other

    cs.CV

    All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

    Authors: Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, Peter Tu, Jing Zhang

    Abstract: Recent studies have demonstrated that Reinforcement Learning (RL), notably Group Relative Policy Optimization (GRPO), can intrinsically elicit and enhance the reasoning capabilities of Vision-Language Models (VLMs). However, despite the promise, the underlying mechanisms that drive the effectiveness of RL models as well as their limitations remain underexplored. In this paper, we highlight a funda… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: Accepted to CVPR2026

  8. arXiv:2603.28503  [pdf, ps, other

    cs.CV

    Bridging the Geometry Mismatch: Frequency-Aware Anisotropic Serialization for Thin-Structure SSMs

    Authors: Jin Bai, Huiyao Zhang, Qi Wen, Ningyang Li, Shengyang Li, Atta ur Rahman, Xiaolin Tian

    Abstract: The segmentation of thin linear structures is inherently topology allowbreak-critical, where minor local errors can sever long-range connectivity. While recent State-Space Models (SSMs) offer efficient long-range modeling, their isotropic serialization (e.g., raster scanning) creates a geometry mismatch for anisotropic targets, causing state propagation across rather than along the structure traje… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  9. arXiv:2603.27076  [pdf, ps, other

    cs.AI

    When Verification Hurts: Asymmetric Effects of Multi-Agent Feedback in Logic Proof Tutoring

    Authors: Tahreem Yasir, Sutapa Dey Tithi, Benyamin Tabarsi, Dmitri Droujkov, Sam Gilson Yasitha Rajapaksha, Xiaoyi Tian, Arun Ramesh, DongKuan, Xu, Tiffany Barnes

    Abstract: Large language models (LLMs) are increasingly used for automated tutoring, but their reliability in structured symbolic domains remains unclear. We study step-level feedback for propositional logic proofs, which require precise symbolic reasoning aligned with a learner's current proof state. We introduce a knowledge-graph-grounded benchmark of 516 unique proof states with step-level annotations an… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 21 pages, 1 figure

  10. arXiv:2603.26349  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Generative Score Inference for Multimodal Data

    Authors: Xinyu Tian, Xiaotong Shen

    Abstract: Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome thes… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 25 pages, 4 figures

  11. arXiv:2603.23580  [pdf, ps, other

    cs.LG

    MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

    Authors: Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang

    Abstract: Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstracts diagnostic patterns from historical resolutions and provides confidence-calib… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  12. arXiv:2603.22264  [pdf, ps, other

    cs.RO

    UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

    Authors: Gu Zhang, Qicheng Xu, Haozhe Zhang, Jianhan Ma, Long He, Yiming Bao, Zeyu Ping, Zhecheng Yuan, Chenhao Lu, Chengbo Yuan, Tianhai Liang, Xiaoyu Tian, Maanping Shao, Feihong Zhang, Mingyu Ding, Yang Gao, Hao Zhao, Hang Zhao, Huazhe Xu

    Abstract: Dexterous manipulation remains challenging due to the cost of collecting real-robot teleoperation data, the heterogeneity of hand embodiments, and the high dimensionality of control. We present UniDex, a robot foundation suite that couples a large-scale robot-centric dataset with a unified vision-language-action (VLA) policy and a practical human-data capture setup for universal dexterous hand con… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  13. Exploring Teacher-Chatbot Interaction and Affect in Block-Based Programming

    Authors: Bahare Riahi, Ally Limke, Xiaoyi Tian, Viktoriia Storozhevykh, Sayali Patukale, Tahreem Yasir, Khushbu Singh, Jennifer Chiu, Nicholas lytle, Tiffany Barnes, Veronica Catete

    Abstract: AI-based chatbots have the potential to accelerate learning and teaching, but may also have counterproductive consequences without thoughtful design and scaffolding. To better understand teachers' perspectives on large language model (LLM)-based chatbots, we conducted a study with 11 teams of middle school teachers using chatbots for a science and computational thinking activity within a block-bas… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 19 pages, 9 figures, CHI26

  14. arXiv:2603.11101  [pdf, ps, other

    cs.RO cs.AI cs.DC

    Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure

    Authors: Yongjian Guo, Yunxuan Ma, Haoran Sun, Zhong Guan, Shuai Di, Jing Long, Wanting Xu, Xiaodong Bai, Wen Huang, Yucheng Guo, Chen Zhou, Qiming Yang, Mingxi Luo, Tianyun Zhao, Hedan Yang, Song Wang, Xiaomeng Tian, Xiaolong Xiang, Zhen Sun, Yu Wei, Luqiao Wang, Yuzhen Li, Chenfeng Gu, Junwu Xiong, Yicheng Gong

    Abstract: Embodied intelligence is a key step towards Artificial General Intelligence (AGI), yet its development faces multiple challenges including data, frameworks, infrastructure, and evaluation systems. To address these issues, we have, for the first time in the industry, launched a cloud-based, thousand-GPU distributed training platform for embodied intelligence, built upon the widely adopted LeRobot f… ▽ More

    Submitted 18 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

  15. arXiv:2603.07311  [pdf

    cs.AI

    Data-Driven Hints in Intelligent Tutoring Systems

    Authors: Sutapa Dey Tithi, Kimia Fazeli, Dmitri Droujkov, Tahreem Yasir, Xiaoyi Tian, Tiffany Barnes

    Abstract: This chapter explores the evolution of data-driven hint generation for intelligent tutoring systems (ITS). The Hint Factory and Interaction Networks have enabled the generation of next-step hints, waypoints, and strategic subgoals from historical student data. Data-driven techniques have also enabled systems to find the right time to provide hints. We explore further potential data-driven adaptati… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

    Comments: Book Chapter in the Encyclopedia of AI in Education

  16. arXiv:2603.02754  [pdf, ps, other

    cs.CV

    Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

    Authors: Yi Liu, Jing Zhang, Di Wang, Xiaoyu Tian, Haonan Guo, Bo Du

    Abstract: Multimodal large language models (MLLMs) suffer from pronounced hallucinations in remote sensing visual question-answering (RS-VQA), primarily caused by visual grounding failures in large-scale scenes or misinterpretation of fine-grained small targets. To systematically analyze these issues, we introduce RSHBench, a protocol-based benchmark for fine-grained diagnosis of factual and logical halluci… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  17. arXiv:2603.01913  [pdf, ps, other

    cs.CV

    Zero-shot Low-Field MRI Enhancement via Diffusion-Based Adaptive Contrast Transport

    Authors: Muyu Liu, Chenhe Du, Xuanyu Tian, Qing Wu, Xiao Wang, Haonan Zhang, Hongjiang Wei, Yuyao Zhang

    Abstract: Low-field (LF) magnetic resonance imaging (MRI) democratizes access to diagnostic imaging but is fundamentally limited by low signal-to-noise ratio and significant tissue contrast distortion due to field-dependent relaxation dynamics. Reconstructing high-field (HF) quality images from LF data is a blind inverse problem, severely challenged by the scarcity of paired training data and the unknown, n… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 11 pages, 4 figures, conference paper

  18. arXiv:2603.01890  [pdf, ps, other

    cs.CV

    Resolving Blind Inverse Problems under Dynamic Range Compression via Structured Forward Operator Modeling

    Authors: Muyu Liu, Xuanyu Tian, Chenhe Du, Qing Wu, Hongjiang Wei, Yuyao Zhang

    Abstract: Recovering radiometric fidelity from unknown dynamic range compression (UDRC), such as low-light enhancement and HDR reconstruction, is a challenging blind inverse problem, due to the unknown forward model and irreversible information loss introduced by compression. To address this challenge, we first identify monotonicity as the fundamental physical invariant shared across UDRC tasks. Leveraging… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 16 pages, 10 figures, conference paper

  19. arXiv:2603.01502  [pdf, ps, other

    cs.CL eess.AS

    Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs

    Authors: Ming-Hao Hsu, Xueyao Zhang, Xiaohai Tian, Jun Zhang, Zhizheng Wu

    Abstract: Recent advancements in Large Speech-Language Models have significantly bridged the gap between acoustic signals and linguistic understanding. However, a persistent performance disparity remains in speech-based input tasks compared to direct text inference. In this paper, we investigate the dynamic roots of this modality gap beyond static geometric alignment, analyzing how speech and text represent… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

  20. arXiv:2602.23214  [pdf, ps, other

    cs.CV cs.LG eess.IV

    Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction

    Authors: Chenhe Du, Xuanyu Tian, Qing Wu, Muyu Liu, Jingyi Yu, Hongjiang Wei, Yuyao Zhang

    Abstract: Plug-and-Play diffusion prior (PnPDP) frameworks have emerged as a powerful paradigm for solving imaging inverse problems by treating pretrained generative models as modular priors. However, we identify a critical flaw in prevailing PnP solvers (e.g., based on HQS or Proximal Gradient): they function as memoryless operators, updating estimates solely based on instantaneous gradients. This lack of… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

  21. arXiv:2602.19600  [pdf, ps, other

    stat.ML cs.LG

    Manifold-Aligned Generative Transport

    Authors: Xinyu Tian, Xiaotong Shen

    Abstract: High-dimensional generative modeling is fundamentally a manifold-learning problem: real data concentrate near a low-dimensional structure embedded in the ambient space. Effective generators must therefore balance support fidelity -- placing probability mass near the data manifold -- with sampling efficiency. Diffusion models often capture near-manifold structure but require many iterative denoisin… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

    Comments: 64 pages, 5 figures

  22. arXiv:2602.16806  [pdf, ps, other

    cs.HC

    Exploring the Design and Impact of Interactive Worked Examples for Learners with Varying Prior Knowledge

    Authors: Sutapa Dey Tithi, Xiaoyi Tian, Ally Limke, Min Chi, Tiffany Barnes

    Abstract: Tutoring systems improve learning through tailored interventions, such as worked examples, but often suffer from the aptitude-treatment interaction effect where low prior knowledge learners benefit more. We applied the ICAP learning theory to design two new types of worked examples, Buggy (students fix bugs), and Guided (students complete missing rules), requiring varying levels of cognitive engag… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

  23. arXiv:2602.11761  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

    Authors: MiniCPM Team, Wenhao An, Yingfa Chen, Yewei Fang, Jiayi Li, Xin Li, Yaohui Li, Yishan Li, Yuxuan Li, Biyuan Lin, Chuan Liu, Hezi Liu, Siyuan Liu, Hongya Lyu, Yinxu Pan, Shixin Ren, Xingyu Shen, Zhou Su, Haojun Sun, Yangang Sun, Zhen Leng Thai, Xin Tian, Rui Wang, Xiaorong Wang, Yudong Wang , et al. (22 additional authors not shown)

    Abstract: The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention mechanisms attempt to mitigate these issues, they typically involve a trade-off between memory efficiency and model performance. This paper introduces MiniCPM-SALA, a… ▽ More

    Submitted 28 February, 2026; v1 submitted 12 February, 2026; originally announced February 2026.

    Comments: MiniCPM-SALA Technical Report

  24. arXiv:2602.08226  [pdf, ps, other

    cs.DB

    ByteHouse: ByteDance's Cloud-Native Data Warehouse for Real-Time Multimodal Data Analytics

    Authors: Yuxing Han, Yu Lin, Yifeng Dong, Xuanhe Zhou, Xindong Peng, Xinhui Tian, Zhiyuan You, Yingzhong Guo, Xi Chen, Weiping Qu, Tao Meng, Dayue Gao, Haoyu Wang, Liuxi Wei, Huanchen Zhang, Fan Wu

    Abstract: With the rapid rise of intelligent data services, modern enterprises increasingly require efficient, multimodal, and cost-effective data analytics infrastructures. However, in ByteDance's production environments, existing systems fall short due to limitations such as I/O-inefficient multimodal storage, inflexible query optimization (e.g., failing to optimize multimodal access patterns), and perfor… ▽ More

    Submitted 25 March, 2026; v1 submitted 8 February, 2026; originally announced February 2026.

  25. arXiv:2602.07308  [pdf, ps, other

    cs.AI

    Adaptive Scaffolding for Cognitive Engagement in an Intelligent Tutoring System

    Authors: Sutapa Dey Tithi, Nazia Alam, Tahreem Yasir, Yang Shi, Xiaoyi Tian, Min Chi, Tiffany Barnes

    Abstract: The ICAP framework defines four cognitive engagement levels: Passive, Active, Constructive, and Interactive, where increased cognitive engagement can yield improved learning. However, personalizing learning activities that elicit the optimal level of cognitive engagement remains a key challenge in intelligent tutoring systems (ITS). In this work, we develop and evaluate a system that adaptively sc… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  26. arXiv:2602.04162  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Improving 2D Diffusion Models for 3D Medical Imaging with Inter-Slice Consistent Stochasticity

    Authors: Chenhe Du, Qing Wu, Xuanyu Tian, Jingyi Yu, Hongjiang Wei, Yuyao Zhang

    Abstract: 3D medical imaging is in high demand and essential for clinical diagnosis and scientific research. Currently, diffusion models (DMs) have become an effective tool for medical imaging reconstruction thanks to their ability to learn rich, high-quality data priors. However, learning the 3D data distribution with DMs in medical imaging is challenging, not only due to the difficulties in data collectio… ▽ More

    Submitted 9 February, 2026; v1 submitted 3 February, 2026; originally announced February 2026.

    Comments: Accepted by ICLR 2026

  27. arXiv:2601.21558  [pdf, ps, other

    cs.CL

    ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

    Authors: Xiaoyu Tian, Haotian Wang, Shuaiting Chen, Hao Zhou, Kaichi Yu, Yudian Zhang, Jade Ouyang, Junxi Yin, Jiong Chen, Baoyan Guo, Lei Zhang, Junjie Tao, Yuansheng Song, Ming Cui, Chengwei Liu

    Abstract: Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on non-verifiable simulated environments, rely exclusively on either supervised fine-tuning (SFT) or reinforcement learning (RL), and struggle with stable long-horizon, multi… ▽ More

    Submitted 30 January, 2026; v1 submitted 29 January, 2026; originally announced January 2026.

  28. arXiv:2601.17562  [pdf, ps, other

    math.NA cs.LG

    Sparse RBF Networks for PDEs and nonlocal equations: function space theory, operator calculus, and training algorithms

    Authors: Zihan Shao, Konstantin Pieper, Xiaochuan Tian

    Abstract: This work presents a systematic analysis and extension of the sparse radial basis function network (SparseRBFnet) previously introduced for solving nonlinear partial differential equations (PDEs). Based on its adaptive-width shallow kernel network formulation, we further investigate its function-space characterization, operator evaluation, and computational algorithm. We provide a unified descript… ▽ More

    Submitted 24 January, 2026; originally announced January 2026.

    Comments: 30 pages, 7 figures

  29. arXiv:2601.17288  [pdf, ps, other

    cs.CV

    Fluxamba: Topology-Aware Anisotropic State Space Models for Geological Lineament Segmentation in Multi-Source Remote Sensing

    Authors: Jin Bai, Huiyao Zhang, Qi Wen, Shengyang Li, Xiaolin Tian, Atta ur Rahman

    Abstract: The precise segmentation of geological linear features, spanning from planetary lineaments to terrestrial fractures, demands capturing long-range dependencies across complex anisotropic topologies. Although State Space Models (SSMs) offer near-linear computational complexity, their dependence on rigid, axis-aligned scanning trajectories induces a fundamental topological mismatch with curvilinear t… ▽ More

    Submitted 23 January, 2026; originally announced January 2026.

  30. arXiv:2601.16214  [pdf, ps, other

    cs.CV

    CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

    Authors: Wenhang Ge, Guibao Shen, Jiawei Feng, Luozhou Wang, Hao Lu, Xingye Tian, Xin Tao, Ying-Cong Chen

    Abstract: Recent advances in camera-controlled video diffusion models have significantly improved video-camera alignment. However, the camera controllability still remains limited. In this work, we build upon Reward Feedback Learning and aim to further improve camera controllability. However, directly borrowing existing ReFL approaches faces several challenges. First, current reward models lack the capacity… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  31. arXiv:2601.15681  [pdf, ps, other

    cs.CV

    Consistency-Regularized GAN for Few-Shot SAR Target Recognition

    Authors: Yikui Zhai, Shikuang Liu, Wenlve Zhou, Hongsheng Zhang, Zhiheng Zhou, Xiaolin Tian, C. L. Philip Chen

    Abstract: Few-shot recognition in synthetic aperture radar (SAR) imagery remains a critical bottleneck for real-world applications due to extreme data scarcity. A promising strategy involves synthesizing a large dataset with a generative adversarial network (GAN), pre-training a model via self-supervised learning (SSL), and then fine-tuning on the few labeled samples. However, this approach faces a fundamen… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  32. arXiv:2601.14287  [pdf, ps, other

    cs.LG

    Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents

    Authors: Xiucheng Xu, Bingbing Xu, Xueyun Tian, Zihe Huang, Rongxin Chen, Yunfan Li, Huawei Shen

    Abstract: External memory systems are pivotal for enabling Large Language Model (LLM) agents to maintain persistent knowledge and perform long-horizon decision-making. Existing paradigms typically follow a two-stage process: computationally expensive memory construction (e.g., structuring data into graphs) followed by naive retrieval-augmented generation. However, our empirical analysis reveals two fundamen… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

  33. arXiv:2601.12986  [pdf, ps, other

    cs.CR

    KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

    Authors: Zhenhua Xu, Xiaoning Tian, Wenjun Zeng, Wenpeng Xing, Tianliang Lu, Gaolei Li, Chaochao Chen, Meng Han

    Abstract: Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard,… ▽ More

    Submitted 20 January, 2026; v1 submitted 19 January, 2026; originally announced January 2026.

    Comments: Accepted by ICASSP2026

  34. arXiv:2601.12748  [pdf, ps, other

    cs.CL

    Towards Robust Process Reward Modeling via Noise-aware Learning

    Authors: Bin Xie, Bingbing Xu, Xueyun Tian, Yilin Chen, Huawei Shen

    Abstract: Process Reward Models (PRMs) have achieved strong results in complex reasoning, but are bottlenecked by costly process-level supervision. A widely used alternative, Monte Carlo Estimation (MCE), defines process rewards as the probability that a policy model reaches the correct final answer from a given reasoning step. However, step correctness is an intrinsic property of the reasoning trajectory,… ▽ More

    Submitted 19 January, 2026; originally announced January 2026.

  35. arXiv:2601.12747  [pdf, ps, other

    cs.CV

    SSPFormer: Self-Supervised Pretrained Transformer for MRI Images

    Authors: Jingkai Li, Xiaoze Tian, Yuhang Shen, Jia Wang, Dianjie Lu, Guijuan Zhang, Zhuoran Zheng

    Abstract: The pre-trained transformer demonstrates remarkable generalization ability in natural image processing. However, directly transferring it to magnetic resonance images faces two key challenges: the inability to adapt to the specificity of medical anatomical structures and the limitations brought about by the privacy and scarcity of medical data. To address these issues, this paper proposes a Self-S… ▽ More

    Submitted 19 January, 2026; originally announced January 2026.

    Comments: Undergraduate student as first author submitted to IJCAI

  36. arXiv:2601.10323  [pdf, ps, other

    cs.CV cs.CL

    ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding

    Authors: Xueyun Tian, Wei Li, Bingbing Xu, Heng Dong, Yuanzhuo Wang, Huawei Shen

    Abstract: Recent Omni-multimodal Large Language Models show promise in unified audio, vision, and text modeling. However, streaming audio-video understanding remains challenging, as existing approaches suffer from disjointed capabilities: they typically exhibit incomplete modality support or lack autonomous proactive monitoring. To address this, we present ROMA, a real-time omni-multimodal assistant for uni… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

    Comments: Our project page is available at https://eureka-maggie.github.io/ROMA_show

  37. arXiv:2601.07577  [pdf, ps, other

    cs.AI

    Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents

    Authors: Yunfan Li, Bingbing Xu, Xueyun Tian, Xiucheng Xu, Huawei Shen

    Abstract: Recent advances in large language models (LLMs) have enabled agents to autonomously execute complex, long-horizon tasks, yet planning remains a primary bottleneck for reliable task execution. Existing methods typically fall into two paradigms: step-wise planning, which is reactive but often short-sighted; and one-shot planning, which generates a complete plan upfront yet is brittle to execution er… ▽ More

    Submitted 12 January, 2026; originally announced January 2026.

  38. arXiv:2601.06776  [pdf, ps, other

    cs.AI

    From Text to Simulation: A Multi-Agent LLM Workflow for Automated Chemical Process Design

    Authors: Xufei Tian, Wenli Du, Shaoyi Yang, Han Hu, Hui Xin, Shifeng Qu, Ke Ye

    Abstract: Process simulation is a critical cornerstone of chemical engineering design. Current automated chemical design methodologies focus mainly on various representations of process flow diagrams. However, transforming these diagrams into executable simulation flowsheets remains a time-consuming and labor-intensive endeavor, requiring extensive manual parameter configuration within simulation software.… ▽ More

    Submitted 10 January, 2026; originally announced January 2026.

  39. arXiv:2601.04992  [pdf, ps, other

    cs.CL

    Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization

    Authors: Xueyun Tian, Minghua Ma, Bingbing Xu, Nuoyan Lyu, Wei Li, Heng Dong, Zheng Chu, Yuanzhuo Wang, Huawei Shen

    Abstract: Supervised fine-tuning (SFT) on chain-of-thought (CoT) trajectories demonstrations is a common approach for enabling reasoning in large language models. Standard practices typically only retain trajectories with correct final answers (positives) while ignoring the rest (negatives). We argue that this paradigm discards substantial supervision and exacerbates overfitting, limiting out-of-domain (OOD… ▽ More

    Submitted 8 January, 2026; v1 submitted 8 January, 2026; originally announced January 2026.

    Comments: Code and data are available at https://github.com/Eureka-Maggie/GLOW

  40. arXiv:2512.21815  [pdf, ps, other

    cs.CV cs.LG

    Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

    Authors: Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang

    Abstract: Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about… ▽ More

    Submitted 25 December, 2025; originally announced December 2025.

    Comments: 19 Pages,11 figures,8 tables

    ACM Class: I.2.0; I.4.0

  41. arXiv:2512.20900  [pdf

    cs.CE

    When Experts Speak:Sequential LLM-Bayesian Learning for Startup Success Prediction

    Authors: Yidong Chai, Yanguang Liu, Xuan Tian, Jiaheng Xie, Yonghang Zhou

    Abstract: Evaluating startups is inherently challenging in entrepreneurial finance, where investors confront severe information asymmetry and limited quantitative data. Leveraging a novel expert network call data, we develop an LLM-Bayesian model that analyzes these conversations at the question-answer turn level, extracting semantic and evaluative signals via large language models (LLMs) and aggregating th… ▽ More

    Submitted 28 January, 2026; v1 submitted 23 December, 2025; originally announced December 2025.

  42. arXiv:2512.19414  [pdf, ps, other

    cs.CR cs.CL

    From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions

    Authors: Jiaren Peng, Hongda Sun, Xuan Tian, Cheng Huang, Zeqing Li, Rui Yan

    Abstract: The automation of Cyber Threat Intelligence (CTI) relies heavily on Named Entity Recognition (NER) to extract critical entities from unstructured text. Currently, Large Language Models (LLMs) primarily address this task through retrieval-based In-Context Learning (ICL). This paper analyzes this mainstream paradigm, revealing a fundamental flaw: its success stems not from global semantic similarity… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  43. arXiv:2512.15825  [pdf, ps, other

    cs.CY

    SnapClass: An AI-Enhanced Classroom Management System for Block-Based Programming

    Authors: Bahare Riahi, Xiaoyi Tian, Ally Limke, Viktoriia Storozhevykh, Veronica Catete, Tiffany Barnes, Nicholas Lytle, Khushbu Singh

    Abstract: Block-Based Programming (BBP) platforms, such as Snap!, have become increasingly prominent in K-12 computer science education due to their ability to simplify programming concepts and foster computational thinking from an early age. While these platforms engage students through visual and gamified interfaces, teachers often face challenges in using them effectively and finding all the necessary fe… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: 2 pages, 7 figures

  44. arXiv:2512.09673  [pdf, ps, other

    cs.LG cs.AI cs.NE stat.ML

    Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power

    Authors: Yuzhu Chen, Tian Qin, Xinmei Tian, Fengxiang He, Dacheng Tao

    Abstract: Equivariant neural networks encode symmetry as an inductive bias and have achieved strong empirical performance in wide domains. However, their expressive power remains not well understood. Focusing on 2-layer ReLU networks, this paper investigates the impact of equivariance constraints on the expressivity of equivariant and layer-wise equivariant networks. By examining the boundary hyperplanes an… ▽ More

    Submitted 25 December, 2025; v1 submitted 10 December, 2025; originally announced December 2025.

  45. arXiv:2512.05920  [pdf, ps, other

    cs.CV cs.LG

    NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction

    Authors: Jiawen Yang, Yihui Cao, Xuanyu Tian, Yuyao Zhang, Hongjiang Wei

    Abstract: Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate postoperative facial appearance prediction remains challenging due to the complex nonlinear interactions between skeletal movements and facial soft tissue. Existing biomechanical, parametric models and deep-learning approaches either lack… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  46. arXiv:2511.16117  [pdf, ps, other

    cs.CV

    Decoupling Complexity from Scale in Latent Diffusion Model

    Authors: Tianxiong Zhong, Xingye Tian, Xuebo Wang, Boyuan Jiang, Xin Tao, Pengfei Wan

    Abstract: Existing latent diffusion models typically couple scale with content complexity, using more latent tokens to represent higher-resolution images or higher-frame rate videos. However, the latent capacity required to represent visual data primarily depends on content complexity, with scale serving only as an upper bound. Motivated by this observation, we propose DCS-LDM, a novel paradigm for visual g… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 15 pages, 16 figures

  47. arXiv:2511.12633  [pdf, ps, other

    cs.CV

    Denoising Vision Transformer Autoencoder with Spectral Self-Regularization

    Authors: Xunzhi Xiang, Xingye Tian, Guiyu Zhang, Yabo Chen, Shaofeng Zhang, Xuebo Wang, Xin Tao, Qi Fan

    Abstract: Variational autoencoders (VAEs) typically encode images into a compact latent space, reducing computational cost but introducing an optimization dilemma: a higher-dimensional latent space improves reconstruction fidelity but often hampers generative performance. Recent methods attempt to address this dilemma by regularizing high-dimensional latent spaces using external vision foundation models (VF… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  48. arXiv:2511.11436  [pdf, ps, other

    eess.IV cs.CV

    Unsupervised Motion-Compensated Decomposition for Cardiac MRI Reconstruction via Neural Representation

    Authors: Xuanyu Tian, Lixuan Chen, Qing Wu, Xiao Wang, Jie Feng, Yuyao Zhang, Hongjiang Wei

    Abstract: Cardiac magnetic resonance (CMR) imaging is widely used to characterize cardiac morphology and function. To accelerate CMR imaging, various methods have been proposed to recover high-quality spatiotemporal CMR images from highly undersampled k-t space data. However, current CMR reconstruction techniques either fail to achieve satisfactory image quality or are restricted by the scarcity of ground t… ▽ More

    Submitted 17 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI-26

  49. arXiv:2511.01293  [pdf, ps, other

    cs.CV

    Detecting Generated Images by Fitting Natural Image Distributions

    Authors: Yonggang Zhang, Jun Nie, Xinmei Tian, Mingming Gong, Kun Zhang, Bo Han

    Abstract: The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of nat… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

    Comments: 25 pages, 9 figures, NeurIPS 2025 spotlight

  50. arXiv:2510.26390  [pdf, ps, other

    eess.IV cs.AI cs.CV

    SPG-CDENet: Spatial Prior-Guided Cross Dual Encoder Network for Multi-Organ Segmentation

    Authors: Xizhi Tian, Changjun Zhou, Yulin. Yang

    Abstract: Multi-organ segmentation is a critical task in computer-aided diagnosis. While recent deep learning methods have achieved remarkable success in image segmentation, huge variations in organ size and shape challenge their effectiveness in multi-organ segmentation. To address these challenges, we propose a Spatial Prior-Guided Cross Dual Encoder Network (SPG-CDENet), a novel two-stage segmentation pa… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.