Skip to main content

Showing 1–50 of 1,611 results for author: Sun, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19414  [pdf, ps, other

    cs.CR cs.CL

    From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions

    Authors: Jiaren Peng, Hongda Sun, Xuan Tian, Cheng Huang, Zeqing Li, Rui Yan

    Abstract: The automation of Cyber Threat Intelligence (CTI) relies heavily on Named Entity Recognition (NER) to extract critical entities from unstructured text. Currently, Large Language Models (LLMs) primarily address this task through retrieval-based In-Context Learning (ICL). This paper analyzes this mainstream paradigm, revealing a fundamental flaw: its success stems not from global semantic similarity… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  2. arXiv:2512.18928  [pdf, ps, other

    cs.LG

    The Ensemble Schr{ö}dinger Bridge filter for Nonlinear Data Assimilation

    Authors: Feng Bao, Hui Sun

    Abstract: This work puts forward a novel nonlinear optimal filter namely the Ensemble Schr{ö}dinger Bridge nonlinear filter. The proposed filter finds marriage of the standard prediction procedure and the diffusion generative modeling for the analysis procedure to realize one filtering step. The designed approach finds no structural model error, and it is derivative free, training free and highly parallizab… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

  3. arXiv:2512.18595  [pdf, ps, other

    cs.LG

    Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows

    Authors: Runze Mao, Rui Zhang, Xuan Bai, Tianhao Wu, Teng Zhang, Zhenyi Chen, Minqi Lin, Bocheng Zeng, Yangchen Xu, Yingxuan Xiang, Haoze Zhang, Shubham Goswami, Pierre A. Dawe, Yifan Xu, Zhenhua An, Mengtao Yan, Xiaoyi Lu, Yi Wang, Rongbo Bai, Haobu Gao, Xiaohang Fang, Han Li, Hao Sun, Zhi X. Chen

    Abstract: Predicting multiphysics dynamics is computationally expensive and challenging due to the severe coupling of multi-scale, heterogeneous physical processes. While neural surrogates promise a paradigm shift, the field currently suffers from an "illusion of mastery", as repeatedly emphasized in top-tier commentaries: existing evaluations overly rely on simplified, low-dimensional proxies, which fail t… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: 52 pages, 20 figures. Code and data available at https://github.com/deepflame-ai/REALM. Companion website and leaderboard at https://realm-bench.org

  4. arXiv:2512.18344  [pdf

    cs.CV cs.AI

    MCVI-SANet: A lightweight semi-supervised model for LAI and SPAD estimation of winter wheat under vegetation index saturation

    Authors: Zhiheng Zhang, Jiajun Yang, Hong Sun, Dong Wang, Honghua Jiang, Yaru Chen, Tangyuan Ning

    Abstract: Vegetation index (VI) saturation during the dense canopy stage and limited ground-truth annotations of winter wheat constrain accurate estimation of LAI and SPAD. Existing VI-based and texture-driven machine learning methods exhibit limited feature expressiveness. In addition, deep learning baselines suffer from domain gaps and high data demands, which restrict their generalization. Therefore, thi… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  5. arXiv:2512.18256  [pdf, ps, other

    cs.AI cs.LO

    MSC-180: A Benchmark for Automated Formal Theorem Proving from Mathematical Subject Classification

    Authors: Sirui Li, Wangyue Lu, Xiaorui Shi, Ke Weng, Haozhe Sun, Minghe Yu, Tiancheng Zhang, Ge Yu, Hengyu Liu, Lun Du

    Abstract: Automated Theorem Proving (ATP) represents a core research direction in artificial intelligence for achieving formal reasoning and verification, playing a significant role in advancing machine intelligence. However, current large language model (LLM)-based theorem provers suffer from limitations such as restricted domain coverage and weak generalization in mathematical reasoning. To address these… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  6. arXiv:2512.16776  [pdf, ps, other

    cs.CV

    Kling-Omni Technical Report

    Authors: Kling Team, Jialu Chen, Yuanzheng Ci, Xiangyu Du, Zipeng Feng, Kun Gai, Sainan Guo, Feng Han, Jingbin He, Kang He, Xiao Hu, Xiaohua Hu, Boyuan Jiang, Fangyuan Kong, Hang Li, Jie Li, Qingyu Li, Shen Li, Xiaohan Li, Yan Li, Jiajun Liang, Borui Liao, Yiqiao Liao, Weihong Lin, Quande Liu , et al. (43 additional authors not shown)

    Abstract: We present Kling-Omni, a generalist generative framework designed to synthesize high-fidelity videos directly from multimodal visual language inputs. Adopting an end-to-end perspective, Kling-Omni bridges the functional separation among diverse video generation, editing, and intelligent reasoning tasks, integrating them into a holistic system. Unlike disjointed pipeline approaches, Kling-Omni supp… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: Kling-Omni Technical Report

  7. arXiv:2512.15567  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci cs.LG physics.chem-ph

    Evaluating Large Language Models in Scientific Discovery

    Authors: Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M. Pruyn, Yue Huang, Kehan Guo, Xiuzhe Luo, Yuanhao Qu, Yi Qu, Yinkai Wang, Haorui Wang, Jeff Guo, Jingru Gan, Parshin Shojaee, Di Luo, Andres M Bran, Gen Li, Qiyuan Zhao, Shao-Xiong Lennon Luo, Yuxuan Zhang, Xiang Zou, Wanru Zhao, Yifan F. Zhang, Wucheng Zhang , et al. (31 additional authors not shown)

    Abstract: Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain exp… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  8. arXiv:2512.15036  [pdf, ps, other

    cs.LG cs.AI

    Spectral Representation-based Reinforcement Learning

    Authors: Chenxiao Gao, Haotian Sun, Na Li, Dale Schuurmans, Bo Dai

    Abstract: In real-world applications with large state and action spaces, reinforcement learning (RL) typically employs function approximations to represent core components like the policies, value functions, and dynamics models. Although powerful approximations such as neural networks offer great expressiveness, they often present theoretical ambiguities, suffer from optimization instability and exploration… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  9. arXiv:2512.14028  [pdf, ps, other

    cs.CV

    Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding

    Authors: Jiaheng Li, Qiyu Dai, Lihan Li, Praneeth Chakravarthula, He Sun, Baoquan Chen, Wenzheng Chen

    Abstract: We consider the problem of active 3D imaging using single-shot structured light systems, which are widely employed in commercial 3D sensing devices such as Apple Face ID and Intel RealSense. Traditional structured light methods typically decode depth correspondences through pixel-domain matching algorithms, resulting in limited robustness under challenging scenarios like occlusions, fine-structure… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  10. arXiv:2512.13735  [pdf, ps, other

    cs.LG cs.AI

    DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series

    Authors: Xuechun Liu, Heli Sun, Xuecheng Wu, Ruichen Cao, Yunyun Shi, Dingkang Yang, Haoran Li

    Abstract: Multivariate time series anomaly detection (MTSAD) aims to accurately identify and localize complex abnormal patterns in the large-scale industrial control systems. While existing approaches excel in recognizing the distinct patterns under the low-dimensional scenarios, they often fail to robustly capture long-range spatiotemporal dependencies when learning representations from the high-dimensiona… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  11. arXiv:2512.13564  [pdf, ps, other

    cs.CL cs.AI

    Memory in the Age of AI Agents

    Authors: Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu , et al. (22 additional authors not shown)

    Abstract: Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the prol… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  12. arXiv:2512.13175  [pdf, ps, other

    cs.CV

    Seeing the Whole Picture: Distribution-Guided Data-Free Distillation for Semantic Segmentation

    Authors: Hongxuan Sun, Tao Wu

    Abstract: Semantic segmentation requires a holistic understanding of the physical world, as it assigns semantic labels to spatially continuous and structurally coherent objects rather than to isolated pixels. However, existing data-free knowledge distillation (DFKD) methods-primarily designed for classification-often disregard this continuity, resulting in significant performance degradation when applied di… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  13. arXiv:2512.13104  [pdf

    cs.CV

    FID-Net: A Feature-Enhanced Deep Learning Network for Forest Infestation Detection

    Authors: Yan Zhang, Baoxin Li, Han Sun, Yuhang Gao, Mingtai Zhang, Pei Wang

    Abstract: Forest pests threaten ecosystem stability, requiring efficient monitoring. To overcome the limitations of traditional methods in large-scale, fine-grained detection, this study focuses on accurately identifying infected trees and analyzing infestation patterns. We propose FID-Net, a deep learning model that detects pest-affected trees from UAV visible-light imagery and enables infestation analysis… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  14. arXiv:2512.12967  [pdf, ps, other

    cs.CL

    QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

    Authors: Weizhou Shen, Ziyi Yang, Chenliang Li, Zhiyuan Lu, Miao Peng, Huashan Sun, Yingcheng Shi, Shengyi Liao, Shaopeng Lai, Bo Zhang, Dayiheng Liu, Fei Huang, Jingren Zhou, Ming Yan

    Abstract: We introduce QwenLong-L1.5, a model that achieves superior long-context reasoning capabilities through systematic post-training innovations. The key technical breakthroughs of QwenLong-L1.5 are as follows: (1) Long-Context Data Synthesis Pipeline: We develop a systematic synthesis framework that generates challenging reasoning tasks requiring multi-hop grounding over globally distributed evidence.… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  15. arXiv:2512.11686  [pdf, ps, other

    physics.comp-ph cs.LG

    Stable spectral neural operator for learning stiff PDE systems from limited data

    Authors: Rui Zhang, Han Wan, Yang Liu, Hao Sun

    Abstract: Accurate modeling of spatiotemporal dynamics is crucial to understanding complex phenomena across science and engineering. However, this task faces a fundamental challenge when the governing equations are unknown and observational data are sparse. System stiffness, the coupling of multiple time-scales, further exacerbates this problem and hinders long-term prediction. Existing methods fall short:… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  16. arXiv:2512.08868  [pdf, ps, other

    cs.AI

    EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

    Authors: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Bo Zhang, Xuan Zhou, Ming Yan, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. Fung, Yalong Li, Pengjun Xie

    Abstract: Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in real applications. To address th… ▽ More

    Submitted 11 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

  17. arXiv:2512.07527  [pdf, ps, other

    cs.CV cs.GR

    From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images

    Authors: Fei Yu, Yu Liu, Luyang Tang, Mingchao Sun, Zengye Ge, Rui Bu, Yuchao Jin, Haisen Zhao, He Sun, Yangyan Li, Mu Xu, Wenzheng Chen, Baoquan Chen

    Abstract: City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation, where our goal is to synthesize ground-level novel views from sparse orbital images with minimal parallax. This requires inferring nearly $90^\circ$ viewpoint gaps from image sources with severely foreshortened facades and flawed textures, causing state-of-the-art reconstruction engines s… ▽ More

    Submitted 9 December, 2025; v1 submitted 8 December, 2025; originally announced December 2025.

  18. arXiv:2512.07436  [pdf, ps, other

    cs.AI

    LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

    Authors: Hang He, Chuhuai Yue, Chengqi Dong, Mingxue Tian, Zhenfeng Liu, Jiajun Chai, Xiaohan Wang, Yufei Zhang, Qun Liao, Guojun Yin, Wei Lin, Chengcheng Wan, Haiying Sun, Ting Su

    Abstract: Recent advances in large reasoning models (LRMs) have enabled agentic search systems to perform complex multi-step reasoning across multiple sources. However, most studies focus on general information retrieval and rarely explores vertical domains with unique challenges. In this work, we focus on local life services and introduce LocalSearchBench, which encompass diverse and complex business scena… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  19. arXiv:2512.06276  [pdf, ps, other

    cs.CV cs.AI

    RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension

    Authors: Tianyi Gao, Hao Li, Han Fang, Xin Wei, Xiaodong Dong, Hongbo Sun, Ye Yuan, Zhongjiang He, Jinglin Xu, Jingmin Xin, Hao Sun

    Abstract: Referring Expression Comprehension (REC) is a vision-language task that localizes a specific image region based on a textual description. Existing REC benchmarks primarily evaluate perceptual capabilities and lack interpretable scoring mechanisms, which cannot reveal the grounding capability of Multi-modal Large Language Model (MLLM) across different cognitive abilities. To address this limitation… ▽ More

    Submitted 13 December, 2025; v1 submitted 5 December, 2025; originally announced December 2025.

  20. arXiv:2512.05377  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    China Regional 3km Downscaling Based on Residual Corrective Diffusion Model

    Authors: Honglu Sun, Hao Jing, Zhixiang Dai, Sa Xiao, Wei Xue, Jian Sun, Qifeng Lu

    Abstract: A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downscaling and statistical downscaling, to the outputs of global models. This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historica… ▽ More

    Submitted 14 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  21. arXiv:2512.04731  [pdf, ps, other

    cs.RO

    Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

    Authors: Jian Tang, Pu Pang, Haowen Sun, Chengzhong Ma, Xingyu Chen, Hua Huang, Xuguang Lan

    Abstract: Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized d… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  22. arXiv:2512.03862  [pdf, ps, other

    cs.CV

    Diminishing Returns in Self-Supervised Learning

    Authors: Oli Bridge, Huey Sun, Botond Branyicskai-Nagy, Charles D'Ornano, Shomit Basu

    Abstract: While transformer-based architectures have taken computer vision and NLP by storm, they often require a vast amount of parameters and training data to attain strong performance. In this work, we experiment with three distinct pre-training, intermediate fine-tuning, and downstream datasets and training objectives to explore their marginal benefits on a small 5M-parameter vision transformer. We find… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  23. arXiv:2512.03803  [pdf, ps, other

    cs.CL

    Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5

    Authors: Huey Sun, Anabel Yong, Lorenzo Gilly, Felipe Jin

    Abstract: Encoder-decoder models such as FLAN-T5 are finetuned to follow instructions, but often fail when the instructions conflict with memorized continuations ingrained during training. To understand this behavior, we adapt DoLa to FLAN-T5 and examine how representations evolve in the decoder. Our findings show that T5's intermediate layers undergo rapid shifts driven by cross-attention to the encoder. W… ▽ More

    Submitted 12 December, 2025; v1 submitted 3 December, 2025; originally announced December 2025.

  24. arXiv:2512.03566  [pdf, ps, other

    cs.CV cs.MM

    GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

    Authors: Hao Sun, Lei Fan, Donglin Di, Shaohui Liu

    Abstract: Articulated object generation has seen increasing advancements, yet existing models often lack the ability to be conditioned on text prompts. To address the significant gap between textual descriptions and 3D articulated object representations, we propose GAOT, a three-phase framework that generates articulated objects from text prompts, leveraging diffusion models and hypergraph learning in a thr… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: Accepted by ACM MM Asia2026

  25. arXiv:2512.03439  [pdf, ps, other

    cs.IR

    LLM as Explainable Re-Ranker for Recommendation System

    Authors: Yaqi Wang, Haojia Sun, Shuting Zhang

    Abstract: The application of large language models (LLMs) in recommendation systems has recently gained traction. Traditional recommendation systems often lack explainability and suffer from issues such as popularity bias. Previous research has also indicated that LLMs, when used as standalone predictors, fail to achieve accuracy comparable to traditional models. To address these challenges, we propose to u… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  26. arXiv:2512.03043  [pdf, ps, other

    cs.CV

    OneThinker: All-in-one Reasoning Model for Image and Video

    Authors: Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue

    Abstract: Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, existing approaches typically train separate models for different tasks and treat image and video reasoning as disjoint domains. This results in limited scalability toward a multimodal reasoning generalist, which restricts practical versatilit… ▽ More

    Submitted 3 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: Project page: https://github.com/tulerfeng/OneThinker

  27. arXiv:2512.03004  [pdf, ps, other

    cs.CV

    DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

    Authors: Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao Zhao

    Abstract: Autonomous driving needs fast, scalable 4D reconstruction and re-simulation for training and evaluation, yet most methods for dynamic driving scenes still rely on per-scene optimization, known camera calibration, or short frame windows, making them slow and impractical. We revisit this problem from a feedforward perspective and introduce \textbf{Driving Gaussian Grounded Transformer (DGGT)}, a uni… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  28. arXiv:2512.02395  [pdf, ps, other

    cs.CV

    Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

    Authors: Yifan Zhang, Liang Hu, Haofeng Sun, Peiyu Wang, Yichen Wei, Shukang Yin, Jiangbo Pei, Wei Shen, Peng Xia, Yi Peng, Tianyidan Xie, Eric Li, Yang Liu, Xuchen Song, Yahui Zhou

    Abstract: Despite recent progress in multimodal agentic systems, existing approaches often treat image manipulation and web search as disjoint capabilities, rely heavily on costly reinforcement learning, and lack planning grounded in real tool-execution traces. To address these limitations, we present Skywork-R1V4, a 30B (A3B) parameter multimodal agentic model that unifies multimodal planning, active image… ▽ More

    Submitted 8 December, 2025; v1 submitted 1 December, 2025; originally announced December 2025.

    Comments: 21 pages, 7 figures

  29. arXiv:2512.00953  [pdf, ps, other

    cs.CV

    Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval

    Authors: Haojian Huang, Kaijing Ma, Jin Chen, Haodong Chen, Zhou Wu, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Zhongjiang He

    Abstract: In the domain of moment retrieval, accurately identifying temporal segments within videos based on natural language queries remains challenging. Traditional methods often employ pre-trained models that struggle with fine-grained information and deterministic reasoning, leading to difficulties in aligning with complex or ambiguous moments. To overcome these limitations, we explore Deep Evidential R… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: Accepted by AAAI 2026, 10 pages, 9 figures, 5 tables

  30. arXiv:2511.21431  [pdf, ps, other

    cs.DC

    MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

    Authors: Lu Zhao, Rong Shi, Shaoqing Zhang, Yueqiang Chen, Baoguo He, Hongfeng Sun, Ziqing Yin, Shangchao Su, Zhiyan Cui, Liang Dong, Xiyuan Li, Lingbin Wang, Jianwei He, Jiesong Ma, Weikang Huang, Jianglei Tong, Dongdong Gao, Jian Zhang, Hong Tian

    Abstract: The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. This imbalance leads to memory overflow on GPUs with limited capacity, constraining model scalability. Existing load balancing methods, which cap expert capacity, compromise model accuracy and fail on memory-constrained hardware. To address th… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  31. arXiv:2511.21021  [pdf, ps, other

    cs.CV cs.AI

    Structure-Aware Prototype Guided Trusted Multi-View Classification

    Authors: Haojian Huang, Jiahao Shi, Zhe Liu, Harold Haodong Chen, Han Fang, Hao Sun, Zhongjiang He

    Abstract: Trustworthy multi-view classification (TMVC) addresses the challenge of achieving reliable decision-making in complex scenarios where multi-source information is heterogeneous, inconsistent, or even conflicting. Existing TMVC approaches predominantly rely on globally dense neighbor relationships to model intra-view dependencies, leading to high computational costs and an inability to directly ensu… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: 12 pages, 8 figures, 7 tables, Ongoing Work

  32. arXiv:2511.19316  [pdf, ps, other

    cs.CV cs.AI

    Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

    Authors: Xincheng Wang, Hanchi Sun, Wenjun Sun, Kejun Xue, Wangqiu Zhou, Jianbo Zhang, Wei Sun, Dandan Zhu, Xiongkuo Min, Jun Jia, Zhijun Fang

    Abstract: Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lac… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  33. arXiv:2511.19306  [pdf, ps, other

    cs.CV

    Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection

    Authors: Zixuan Wang, Haoran Sun, Jiaming Lu, Wenxuan Wang, Zhongling Huang, Dingwen Zhang, Xuelin Qian, Junwei Han

    Abstract: Infrared small target detection remains challenging due to limited feature representation and severe background interference, resulting in sub-optimal performance. While recent CLIP-inspired methods attempt to leverage textual guidance for detection, they are hindered by inaccurate text descriptions and reliance on manual annotations. To overcome these limitations, we propose DGSPNet, an end-to-en… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: 10 pages, 2 figures

  34. arXiv:2511.17993  [pdf, ps, other

    cs.CV

    SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining

    Authors: Jiayu Wang, Haoyu Bian, Haoran Sun, Shaoning Zeng

    Abstract: Image deraining is crucial for vision applications but is challenged by the complex multi-scale physics of rain and its coupling with scenes. To address this challenge, a novel approach inspired by multi-stage image restoration is proposed, incorporating Point Spread Function (PSF) mechanisms to reveal the image degradation process while combining dynamic physical modeling with sequential feature… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

    Comments: 12 pages, 7 figures, Published in AAAI 2026

  35. arXiv:2511.17910  [pdf, ps, other

    cs.CL

    L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention

    Authors: Yuliang Zhan, Xinyu Tang, Han Wan, Jian Li, Ji-Rong Wen, Hao Sun

    Abstract: Recently, Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs), but Vision-Language Models (VLMs) still struggle with multi-step reasoning tasks due to limited multimodal reasoning data. To bridge this gap, researchers have explored methods to transfer CoT reasoning from LLMs to VLMs. However, existing approaches either need high training cos… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

    Comments: AAAI 2026 oral

  36. arXiv:2511.17681  [pdf, ps, other

    cs.CV

    Vision-Motion-Reference Alignment for Referring Multi-Object Tracking via Multi-Modal Large Language Models

    Authors: Weiyi Lv, Ning Zhang, Hanyang Sun, Haoran Jiang, Kai Zhao, Jing Xiao, Dan Zeng

    Abstract: Referring Multi-Object Tracking (RMOT) extends conventional multi-object tracking (MOT) by introducing natural language references for multi-modal fusion tracking. RMOT benchmarks only describe the object's appearance, relative positions, and initial motion states. This so-called static regulation fails to capture dynamic changes of the object motion, including velocity changes and motion directio… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  37. arXiv:2511.16518  [pdf, ps, other

    cs.RO cs.CL cs.CV

    MiMo-Embodied: X-Embodied Foundation Model Technical Report

    Authors: Xiaoshuai Hao, Lei Zhou, Zhijian Huang, Zhiwen Hou, Yingbo Tang, Lingfeng Zhang, Guang Li, Zheng Lu, Shuhuai Ren, Xianhui Meng, Yuchen Zhang, Jing Wu, Jinghui Lu, Chenxu Dang, Jiayi Guan, Jianhua Wu, Zhiyi Hou, Hanbing Li, Shumeng Xia, Mingliang Zhou, Yinan Zheng, Zihao Yue, Shuhao Gu, Hao Tian, Yuannan Shen , et al. (19 additional authors not shown)

    Abstract: We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Percepti… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: Code: https://github.com/XiaomiMiMo/MiMo-Embodied Model: https://huggingface.co/XiaomiMiMo/MiMo-Embodied-7B

  38. arXiv:2511.16278  [pdf, ps, other

    cs.CR cs.AI

    "To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

    Authors: Zhen Sun, Zongmin Zhang, Deqi Liang, Han Sun, Yule Liu, Yun Shen, Xiangshan Gao, Yilong Yang, Shuai Liu, Yutao Yue, Xinlei He

    Abstract: As LLMs become more common, non-expert users can pose risks, prompting extensive research into jailbreak attacks. However, most existing black-box jailbreak attacks rely on hand-crafted heuristics or narrow search spaces, which limit scalability. Compared with prior attacks, we propose Game-Theory Attack (GTA), an scalable black-box jailbreak framework. Concretely, we formalize the attacker's inte… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 20 pages

  39. arXiv:2511.16005  [pdf, ps, other

    cs.SE cs.AI

    InfCode-C++: Intent-Guided Semantic Retrieval and AST-Structured Search for C++ Issue Resolution

    Authors: Qingao Dong, Mengfei Wang, Hengzhi Zhang, Zhichao Li, Yuan Yuan, Mu Li, Xiang Gao, Hailong Sun, Chunming Hu, Weifeng Lv

    Abstract: Large language model (LLM) agents have recently shown strong performance on repository-level issue resolution, but existing systems are almost exclusively designed for Python and rely heavily on lexical retrieval and shallow code navigation. These approaches transfer poorly to C++ projects, where overloaded identifiers, nested namespaces, template instantiations, and deep control-flow structures m… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  40. arXiv:2511.16004  [pdf, ps, other

    cs.SE cs.AI

    InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution

    Authors: KeFan Li, Mengfei Wang, Hengzhi Zhang, Zhichao Li, Yuan Yuan, Mu Li, Xiang Gao, Hailong Sun, Chunming Hu, Weifeng Lv

    Abstract: Large language models have advanced software engineering automation, yet resolving real-world software issues remains difficult because it requires repository-level reasoning, accurate diagnostics, and strong verification signals. Existing agent-based and pipeline-based methods often rely on insufficient tests, which can lead to patches that satisfy verification but fail to fix the underlying defe… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  41. arXiv:2511.15190  [pdf, ps, other

    cs.LG cs.AI

    Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

    Authors: Yuxuan Gu, Weimin Bai, Yifei Wang, Weijian Luo, He Sun

    Abstract: Masked auto-regressive diffusion models (MAR) benefit from the expressive modeling ability of diffusion models and the flexibility of masked auto-regressive ordering. However, vanilla MAR suffers from slow inference due to its hierarchical inference mechanism: an outer AR unmasking loop and an inner diffusion denoising chain. Such decoupled structure not only harm the generation efficiency but als… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  42. arXiv:2511.15066  [pdf, ps, other

    cs.CV

    BokehFlow: Depth-Free Controllable Bokeh Rendering via Flow Matching

    Authors: Yachuan Huang, Xianrui Luo, Qiwen Wang, Liao Shen, Jiaqi Li, Huiqiang Sun, Zihao Huang, Wei Jiang, Zhiguo Cao

    Abstract: Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  43. arXiv:2511.14271  [pdf, ps, other

    cs.CV

    Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation

    Authors: Weimin Bai, Yubo Li, Weijian Luo, Zeqiang Lai, Yequan Wang, Wenzheng Chen, He Sun

    Abstract: Text-to-3D generation has advanced rapidly, yet state-of-the-art models, encompassing both optimization-based and feed-forward architectures, still face two fundamental limitations. First, they struggle with coarse semantic alignment, often failing to capture fine-grained prompt details. Second, they lack robust 3D spatial understanding, leading to geometric inconsistencies and catastrophic failur… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  44. arXiv:2511.14258  [pdf, ps, other

    cs.CL

    Entropy-Guided Reasoning Compression

    Authors: Hourun Zhu, Yang Gao, Wenlong Fei, Jiawei Li, Huashan Sun

    Abstract: Large reasoning models have demonstrated remarkable performance on complex reasoning tasks, yet the excessive length of their chain-of-thought outputs remains a major practical bottleneck due to high computation cost and poor deployability. Existing compression methods have achieved partial success but overlook a crucial phenomenon in the training process -- the entropy conflict. During compressio… ▽ More

    Submitted 24 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

    Comments: 10pages, 4 figures

  45. arXiv:2511.14208  [pdf, ps, other

    cs.CV

    InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior

    Authors: Weimin Bai, Suzhe Xu, Yiwei Ren, Jinhua Hao, Ming Sun, Wenzheng Chen, He Sun

    Abstract: Video inverse problems are fundamental to streaming, telepresence, and AR/VR, where high perceptual quality must coexist with tight latency constraints. Diffusion-based priors currently deliver state-of-the-art reconstructions, but existing approaches either adapt image diffusion models with ad hoc temporal regularizers - leading to temporal artifacts - or rely on native video diffusion models who… ▽ More

    Submitted 24 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

  46. arXiv:2511.13297  [pdf, ps, other

    cs.CV

    CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving

    Authors: Enhui Ma, Lijun Zhou, Tao Tang, Jiahuan Zhang, Junpeng Jiang, Zhan Zhang, Dong Han, Kun Zhan, Xueyang Zhang, XianPeng Lang, Haiyang Sun, Xia Zhou, Di Lin, Kaicheng Yu

    Abstract: End-to-end planning methods are the de facto standard of the current autonomous driving system, while the robustness of the data-driven approaches suffers due to the notorious long-tail problem (i.e., rare but safety-critical failure cases). In this work, we explore whether recent diffusion-based video generation methods (a.k.a. world models), paired with structured 3D layouts, can enable a fully… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  47. arXiv:2511.13054  [pdf, ps, other

    cs.CV

    ViSS-R1: Self-Supervised Reinforcement Video Reasoning

    Authors: Bo Fang, Yuxin Song, Qiangqiang Wu, Haoyuan Sun, Wenhao Wu, Antoni B. Chan

    Abstract: Complex video reasoning remains a significant challenge for Multimodal Large Language Models (MLLMs), as current R1-based methodologies often prioritize text-centric reasoning derived from text-based and image-based developments. In video tasks, such strategies frequently underutilize rich visual information, leading to potential shortcut learning and increased susceptibility to hallucination. To… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: Our paper was initially titled "Video-SSR1: Self-Supervised Reinforcement Video Reasoning." Upon noticing its close resemblance to the title of a recently released paper, we have decided to rename our work as "ViSS-R1."

  48. arXiv:2511.12921  [pdf, ps, other

    cs.CV

    Generative Photographic Control for Scene-Consistent Video Cinematic Editing

    Authors: Huiqiang Sun, Liao Shen, Zhan Peng, Kun Wang, Size Wu, Yuhang Zang, Tianqi Liu, Zihao Huang, Xingyu Zeng, Zhiguo Cao, Wei Li, Chen Change Loy

    Abstract: Cinematic storytelling is profoundly shaped by the artful manipulation of photographic elements such as depth of field and exposure. These effects are crucial in conveying mood and creating aesthetic appeal. However, controlling these effects in generative video models remains highly challenging, as most existing methods are restricted to camera motion control. In this paper, we propose CineCtrl,… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  49. arXiv:2511.10706  [pdf, ps, other

    cs.LG

    Differentiable Sparse Identification of Lagrangian Dynamics

    Authors: Zitong Zhang, Hao Sun

    Abstract: Data-driven discovery of governing equations from data remains a fundamental challenge in nonlinear dynamics. Although sparse regression techniques have advanced system identification, they struggle with rational functions and noise sensitivity in complex mechanical systems. The Lagrangian formalism offers a promising alternative, as it typically avoids rational expressions and provides a more con… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  50. arXiv:2511.10254  [pdf, ps, other

    cs.CV

    Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis

    Authors: Jiulong Wu, Yucheng Shen, Lingyong Yan, Haixin Sun, Deguo Xia, Jizhou Huang, Min Cao

    Abstract: Facial Emotion Analysis (FEA) extends traditional facial emotion recognition by incorporating explainable, fine-grained reasoning. The task integrates three subtasks: emotion recognition, facial Action Unit (AU) recognition, and AU-based emotion reasoning to model affective states jointly. While recent approaches leverage Vision-Language Models (VLMs) and achieve promising results, they face two c… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: This paper has been accepted by AAAI 2026. 16 pages, 3 figures, 10 tables