Skip to main content

Showing 1–50 of 4,785 results for author: Zhang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19534  [pdf

    cs.CV

    SlicerOrbitSurgerySim: An Open-Source Platform for Virtual Registration and Quantitative Comparison of Preformed Orbital Plates

    Authors: Chi Zhang, Braedon Gunn, Andrew M. Read-Fuller

    Abstract: Poor adaptation of orbital implants remains a major contributor to postoperative complications and revision surgery. Although preformed orbital plates are widely used to reduce cost and operative time compared with customized implants, surgeons currently lack publicly available tools and standardized metrics to quantitatively compare plate fit across vendors, sizes, and patient anatomy. We develop… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: 12 pages, 8 figures. Submitted to Journal of Oral and Maxillofacial Surgery. Code: https://github.com/chz31/SlicerOrbitSurgerySim/tree/main

  2. arXiv:2512.19458  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci

    An Agentic Framework for Autonomous Materials Computation

    Authors: Zeyu Xia, Jinzhe Ma, Congjie Zheng, Shufei Zhang, Yuqiang Li, Hang Su, P. Hu, Changshui Zhang, Xingao Gong, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Mao Su

    Abstract: Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific workflows. Here, we present a domain-specialized agent designed for reliable automati… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  3. arXiv:2512.19292  [pdf

    cs.GT

    LOCO: A Low-Cost SNU-Self-Resilient Latch Using an Output-Split C-Element

    Authors: Ruijun Ma, Xin Chen, Xiaoqing Wen, Hui Xu, Shengnan Ye, Chuanjian Zhang, Senling Wang

    Abstract: As the CMOS technology enters nanometer scales, integrated circuits (ICs) become increasingly sensitive to radiation-induced soft errors, which can corrupt the state of storage elements and cause severe reliability issues. Many hardened designs have been proposed to mitigate soft errors by using filtering elements. However, existing filtering elements only protect their inputs against soft errors… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  4. arXiv:2512.19269  [pdf, ps, other

    cs.RO cs.LG

    Translating Flow to Policy via Hindsight Online Imitation

    Authors: Yitian Zheng, Zhangchen Ye, Weijun Dong, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen, Yang Gao

    Abstract: Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially wit… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  5. arXiv:2512.19135  [pdf, ps, other

    cs.AI

    Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis

    Authors: Chenghao Li, Chaoning Zhang, Yi Lu, Shuxu Chen, Xudong Wang, Jiaquan Zhang, Zhicheng Wang, Zhengxun Jin, Kuien Liu, Sung-Ho Bae, Guoqing Wang, Yang Yang, Hen Tao Shen

    Abstract: With the development of large language models (LLMs), particularly with the introduction of the long reasoning chain technique, the reasoning ability of LLMs in complex problem-solving has been significantly enhanced. While acknowledging the power of long reasoning chains, we cannot help but wonder: Why do different reasoning chains perform differently in reasoning? What components of the reasonin… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  6. arXiv:2512.19124  [pdf, ps, other

    cs.CR

    ShadowBlock: Efficient Dynamic Anonymous Blocklisting and Its Cross-chain Application

    Authors: Haotian Deng, Mengxuan Liu, Chuan Zhang, Wei Huang, Licheng Wang, Liehuang Zhu

    Abstract: Online harassment, incitement to violence, racist behavior, and other harmful content on social media can damage social harmony and even break the law. Traditional blocklisting technologies can block malicious users, but this comes at the expense of identity privacy. The anonymous blocklisting has emerged as an effective mechanism to restrict the abuse of freedom of speech while protecting user id… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  7. arXiv:2512.19070  [pdf, ps, other

    cs.CV cs.CL

    Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding

    Authors: Ruiqi Ma, Yu Yan, Chunhong Zhang, Minghao Yin, XinChao Liu, Zhihong Jin, Zheng Hu

    Abstract: Large Vision-Language Models (LVLMs) bridge the gap between visual and linguistic modalities, demonstrating strong potential across a variety of domains. However, despite significant progress, LVLMs still suffer from severe hallucination issues in object recognition tasks. These models often fail to accurately identify certain objects, leading to text generation that appears fluent but does not co… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  8. arXiv:2512.18673  [pdf

    cs.LG

    Improving Pattern Recognition of Scheduling Anomalies through Structure-Aware and Semantically-Enhanced Graphs

    Authors: Ning Lyu, Junjie Jiang, Lu Chang, Chihui Shao, Feng Chen, Chong Zhang

    Abstract: This paper proposes a structure-aware driven scheduling graph modeling method to improve the accuracy and representation capability of anomaly identification in scheduling behaviors of complex systems. The method first designs a structure-guided scheduling graph construction mechanism that integrates task execution stages, resource node states, and scheduling path information to build dynamically… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

  9. arXiv:2512.18563  [pdf, ps, other

    cs.CV

    OpenView: Empowering MLLMs with Out-of-view VQA

    Authors: Qixiang Chen, Cheng Zhang, Chi-Wing Fu, Jingwen Ye, Jianfei Cai

    Abstract: Recent multimodal large language models (MLLMs) show great potential in natural image understanding. Yet, they perform well, mainly on reasoning in-view contents within the image frame. This paper presents the first study on out-of-view (OOV) understanding, i.e., the ability to reason objects, activities, and scenes beyond the visible frame of a perspective view. Our technical contributions are th… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

    Comments: Code: https://github.com/q1xiangchen/OpenView

  10. arXiv:2512.17717  [pdf, ps, other

    cs.CV

    FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

    Authors: Cheng Peng, Zhuo Su, Liao Wang, Chen Guo, Zhaohu Li, Chengjiang Long, Zheng Lv, Jingxiang Sun, Chenyangguang Zhang, Yebin Liu

    Abstract: We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a transformer-based reconstruction model with structured head query tokens as canonical anchor to aggregate flexible input-number-agnostic, camera-pose-free and expression-fre… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: Project page: https://pengc02.github.io/flexavatar

  11. arXiv:2512.17568  [pdf, ps, other

    cs.RO

    Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation

    Authors: Kangchen Lv, Mingrui Yu, Yongyi Jia, Chenyu Zhang, Xiang Li

    Abstract: Whole-body control of robotic manipulators with awareness of full-arm kinematics is crucial for many manipulation scenarios involving body collision avoidance or body-object interactions, which makes it insufficient to consider only the end-effector poses in policy learning. The typical approach for whole-arm manipulation is to learn actions in the robot's joint space. However, the unalignment bet… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: The first two authors contributed equally. Project Website: https://kinematics-aware-diffusion-policy.github.io

  12. arXiv:2512.17229  [pdf, ps, other

    cs.CV

    Video Detective: Seek Critical Clues Recurrently to Answer Question from Long Videos

    Authors: Henghui Du, Chang Zhou, Chunjie Zhang, Xi Chen, Di Hu

    Abstract: Long Video Question-Answering (LVQA) presents a significant challenge for Multi-modal Large Language Models (MLLMs) due to immense context and overloaded information, which could also lead to prohibitive memory consumption. While existing methods attempt to address these issues by reducing visual tokens or extending model's context length, they may miss useful information or take considerable comp… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  13. arXiv:2512.16906  [pdf, ps, other

    cs.CV

    VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization

    Authors: Xiaoyan Cong, Haotian Yang, Angtian Wang, Yizhi Wang, Yiding Yang, Canyu Zhang, Chongyang Ma

    Abstract: Instruction-based video editing aims to modify an input video according to a natural-language instruction while preserving content fidelity and temporal coherence. However, existing diffusion-based approaches are often trained on paired data of simple editing operations, which fundamentally limits their ability to generalize to diverse and complex, real-world instructions. To address this generali… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  14. arXiv:2512.16576  [pdf, ps, other

    cs.IR

    InfoDCL: Informative Noise Enhanced Diffusion Based Contrastive Learning

    Authors: Xufeng Liang, Zhida Qin, Chong Zhang, Tianyu Huang, Gangyi Ding

    Abstract: Contrastive learning has demonstrated promising potential in recommender systems. Existing methods typically construct sparser views by randomly perturbing the original interaction graph, as they have no idea about the authentic user preferences. Owing to the sparse nature of recommendation data, this paradigm can only capture insufficient semantic information. To address the issue, we propose Inf… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  15. arXiv:2512.16301  [pdf, ps, other

    cs.AI cs.CL

    Adaptation of Agentic AI

    Authors: Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang , et al. (9 additional authors not shown)

    Abstract: Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks. As these systems grow in capability and scope, adaptation becomes a central mechanism for improving performance, reliability, and generalization. In this paper, we unify the rapidly expanding research landscape i… ▽ More

    Submitted 22 December, 2025; v1 submitted 18 December, 2025; originally announced December 2025.

  16. arXiv:2512.16175  [pdf

    astro-ph.EP cs.LG physics.space-ph

    Physics-Informed Neural Networks for Modeling the Martian Induced Magnetosphere

    Authors: Jiawei Gao, Chuanfei Dong, Chi Zhang, Yilan Qin, Simin Shekarpaz, Xinmin Li, Liang Wang, Hongyang Zhou, Abigail Tadlock

    Abstract: Understanding the magnetic field environment around Mars and its response to upstream solar wind conditions provide key insights into the processes driving atmospheric ion escape. To date, global models of Martian induced magnetosphere have been exclusively physics-based, relying on computationally intensive simulations. For the first time, we develop a data-driven model of the Martian induced mag… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  17. arXiv:2512.15840  [pdf, ps, other

    cs.RO cs.CV

    Large Video Planner Enables Generalizable Robot Control

    Authors: Boyuan Chen, Tianyuan Zhang, Haoran Geng, Kiwhan Song, Caiyi Zhang, Peihao Li, William T. Freeman, Jitendra Malik, Pieter Abbeel, Russ Tedrake, Vincent Sitzmann, Yilun Du

    Abstract: General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transfe… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: 29 pages, 16 figures

  18. arXiv:2512.15784  [pdf, ps, other

    cs.AI cs.LG

    Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM

    Authors: Zibin Liu, Cheng Zhang, Xi Zhao, Yunfei Feng, Bingyu Bai, Dahu Feng, Erhu Feng, Yubin Xia, Haibo Chen

    Abstract: Large Language Model (LLM) agents are increasingly deployed to automate complex workflows in mobile and desktop environments. However, current model-centric agent architectures struggle to self-evolve post-deployment: improving personalization, capability, and efficiency typically requires continuous model retraining/fine-tuning, which incurs prohibitive computational overheads and suffers from an… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  19. arXiv:2512.15771  [pdf, ps, other

    cs.LG cs.AI math.NA stat.ML

    TENG++: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets under General Boundary Conditions

    Authors: Xinjie He, Chenggong Zhang

    Abstract: Partial Differential Equations (PDEs) are central to modeling complex systems across physical, biological, and engineering domains, yet traditional numerical methods often struggle with high-dimensional or complex problems. Physics-Informed Neural Networks (PINNs) have emerged as an efficient alternative by embedding physics-based constraints into deep learning frameworks, but they face challenges… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: 7 pages, 2 figures

  20. arXiv:2512.15567  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci cs.LG physics.chem-ph

    Evaluating Large Language Models in Scientific Discovery

    Authors: Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M. Pruyn, Yue Huang, Kehan Guo, Xiuzhe Luo, Yuanhao Qu, Yi Qu, Yinkai Wang, Haorui Wang, Jeff Guo, Jingru Gan, Parshin Shojaee, Di Luo, Andres M Bran, Gen Li, Qiyuan Zhao, Shao-Xiong Lennon Luo, Yuxuan Zhang, Xiang Zou, Wanru Zhao, Yifan F. Zhang, Wucheng Zhang , et al. (31 additional authors not shown)

    Abstract: Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain exp… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  21. arXiv:2512.15335  [pdf, ps, other

    cs.LG cs.CR

    Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

    Authors: Chenxiang Zhang, Tongxi Qu, Zhong Li, Tian Zhang, Jun Pang, Sjouke Mauw

    Abstract: Deep neural networks are widely deployed with quantization techniques to reduce memory and computational costs by lowering the numerical precision of their parameters. While quantization alters model parameters and their outputs, existing privacy analyses primarily focus on full-precision models, leaving a gap in understanding how bit-width reduction can affect privacy leakage. We present the firs… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: accepted at TrustCom 2025

  22. arXiv:2512.15219  [pdf, ps, other

    cs.CL cs.AI

    RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA

    Authors: Chao Zhang, Minghan Li, Tianrui Lv, Guodong Zhou

    Abstract: Large language models (LLMs) often generate hallucinations in knowledge-intensive QA due to parametric knowledge limitations. While existing methods like KG-CoT improve reliability by integrating knowledge graph (KG) paths, they suffer from rigid hop-count selection (solely question-driven) and underutilization of reasoning paths (lack of guidance). To address this, we propose RFKG-CoT: First, it… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: 9pages, 5 figures, accepted by AAAI 2026

  23. arXiv:2512.15154  [pdf, ps, other

    cs.CE

    Update Strategy for Channel Knowledge Map in Complex Environments

    Authors: Ting Wang, Chiya Zhang, Chang Liu, Zhuoyuan Hao, Rubing Han, Weizheng Zhang, Chunlong He

    Abstract: The Channel Knowledge Map (CKM) maps position information to channel state information, leveraging environmental knowledge to reduce signaling overhead in sixth-generation networks. However, constructing a reliable CKM demands substantial data and computation, and in dynamic environments, a pre-built CKM becomes outdated, degrading performance. Frequent retraining restores accuracy but incurs sign… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: 14pages

  24. Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

    Authors: Chenxiao Zhang, Runshi Zhang, Junchen Wang

    Abstract: Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase bounda… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Chenxiao Zhang and Runshi Zhang contributed equally to this work. 14 pages, 11 figures

    Journal ref: Medical Image Analysis 2026

  25. arXiv:2512.15020  [pdf, ps, other

    cs.RO

    ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision

    Authors: Wenlong Xia, Jinhao Zhang, Ce Zhang, Yaojia Wang, Youmin Gong, Jie Mei

    Abstract: Vision-based imitation learning has enabled impressive robotic manipulation skills, but its reliance on object appearance while ignoring the underlying 3D scene structure leads to low training efficiency and poor generalization. To address these challenges, we introduce \emph{Implicit Scene Supervision (ISS) Policy}, a 3D visuomotor DiT-based diffusion policy that predicts sequences of continuous… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  26. arXiv:2512.14754  [pdf, ps, other

    cs.SE cs.AI cs.CL

    Revisiting the Reliability of Language Models in Instruction-Following

    Authors: Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu

    Abstract: Advanced LLMs have achieved near-ceiling instruction-following accuracy on benchmarks such as IFEval. However, these impressive scores do not necessarily translate to reliable services in real-world use, where users often vary their phrasing, contextual framing, and task formulations. In this paper, we study nuance-oriented reliability: whether models exhibit consistent competence across cousin pr… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

    Comments: Preprint

  27. arXiv:2512.14671  [pdf, ps, other

    cs.CV

    ART: Articulated Reconstruction Transformer

    Authors: Zizhang Li, Cheng Zhang, Zhengqin Li, Henry Howard-Jenkins, Zhaoyang Lv, Chen Geng, Jiajun Wu, Richard Newcombe, Jakob Engel, Zhao Dong

    Abstract: We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images. Previous methods for articulated object reconstruction either rely on slow optimization with fragile cross-state correspondences or use feed-forward models limited to specific object categories. In contrast,… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Project Page: https://kyleleey.github.io/ART/

  28. arXiv:2512.14142  [pdf, ps, other

    cs.CL

    Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents

    Authors: Hongqiu Ni, Jiabao Zhang, Guopeng Li, Zilong Wang, Ruiqi Wu, Chi Zhang, Haisheng Tan

    Abstract: Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which p… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: 12 pages, 8 figures

  29. arXiv:2512.14039  [pdf, ps, other

    cs.CV

    ASAP-Textured Gaussians: Enhancing Textured Gaussians with Adaptive Sampling and Anisotropic Parameterization

    Authors: Meng Wei, Cheng Zhang, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai

    Abstract: Recent advances have equipped 3D Gaussian Splatting with texture parameterizations to capture spatially varying attributes, improving the performance of both appearance modeling and downstream tasks. However, the added texture parameters introduce significant memory efficiency challenges. Rather than proposing new texture formulations, we take a step back to examine the characteristics of existing… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  30. arXiv:2512.13989  [pdf, ps, other

    cs.LG

    A Single Architecture for Representing Invariance Under Any Space Group

    Authors: Cindy Y. Zhang, Elif Ertekin, Peter Orbanz, Ryan P. Adams

    Abstract: Incorporating known symmetries in data into machine learning models has consistently improved predictive accuracy, robustness, and generalization. However, achieving exact invariance to specific symmetries typically requires designing bespoke architectures for each group of symmetries, limiting scalability and preventing knowledge transfer across related symmetries. In the case of the space groups… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: 24 pages, 7 figures

  31. arXiv:2512.13638  [pdf

    cs.DC cs.AR

    Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators

    Authors: Aofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini

    Abstract: Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapping is deeply coupled with hardware design which is unwieldy to manual deployment. We propose "Design in Tiles (DiT)", an automated framework connecting a deployment toolchain with a configurable exec… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  32. arXiv:2512.13560  [pdf, ps, other

    cs.CV

    3D Human-Human Interaction Anomaly Detection

    Authors: Shun Maeda, Chunzhi Gu, Koichiro Kamide, Katsuya Hotta, Shangce Gao, Chao Zhang

    Abstract: Human-centric anomaly detection (AD) has been primarily studied to specify anomalous behaviors in a single person. However, as humans by nature tend to act in a collaborative manner, behavioral anomalies can also arise from human-human interactions. Detecting such anomalies using existing single-person AD models is prone to low accuracy, as these approaches are typically not designed to capture th… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  33. arXiv:2512.13507  [pdf, ps, other

    cs.CV

    Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

    Authors: Heyi Chen, Siyan Chen, Xin Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Tianheng Cheng, Xinqi Cheng, Xuyan Chi, Jian Cong, Jing Cui, Qinpeng Cui, Qide Dong, Junliang Fan, Jing Fang, Zetao Fang, Chengjian Feng, Han Feng, Mingyuan Gao, Yu Gao, Dong Guo, Qiushan Guo, Boyang Hao, Qingkai Hao , et al. (171 additional authors not shown)

    Abstract: Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional au… ▽ More

    Submitted 16 December, 2025; v1 submitted 15 December, 2025; originally announced December 2025.

    Comments: Seedance 1.5 pro Technical Report

  34. arXiv:2512.13191  [pdf, ps, other

    cs.CV

    CoRA: A Collaborative Robust Architecture with Hybrid Fusion for Efficient Perception

    Authors: Gong Chen, Chaokun Zhang, Pengcheng Lv, Xiaohui Xie

    Abstract: Collaborative perception has garnered significant attention as a crucial technology to overcome the perceptual limitations of single-agent systems. Many state-of-the-art (SOTA) methods have achieved communication efficiency and high performance via intermediate fusion. However, they share a critical vulnerability: their performance degrades under adverse communication conditions due to the misalig… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: Accepted by AAAI2026

  35. arXiv:2512.13043  [pdf, ps, other

    cs.CV cs.AI

    GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

    Authors: Tong Wei, Yijun Yang, Changhao Zhang, Junliang Xing, Yuanchun Shi, Zongqing Lu, Deheng Ye

    Abstract: Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densify the reward by querying a teacher that provides step-level feedback, e.g., Guided Thought Reinforcement (GTR) and On-Policy Distillation, but rely on costly, often privileged models as the teacher, limiting pr… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  36. arXiv:2512.12949  [pdf, ps, other

    cs.DC

    FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection

    Authors: Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng

    Abstract: The scaling of computation throughput continues to outpace improvements in memory bandwidth, making many deep learning workloads memory-bound. Kernel fusion is a key technique to alleviate this problem, but the fusion strategies of existing compilers and frameworks are limited to using local scratchpad memory. When the intermediate results exceed the limited capacity (such as FFN), the fusion fail… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  37. arXiv:2512.12730  [pdf, ps, other

    cs.CL

    NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

    Authors: Jingzhe Ding, Shengda Long, Changxin Pu, Huan Zhou, Hongwan Gao, Xiang Gao, Chao He, Yue Hou, Fei Hu, Zhaojian Li, Weiran Shi, Zaiyuan Wang, Daoguang Zan, Chenchen Zhang, Xiaoxu Zhang, Qizhi Chen, Xianfu Cheng, Bo Deng, Qingshui Gu, Kai Hua, Juntao Lin, Pai Liu, Mingchen Li, Xuanguang Pan, Zifan Peng , et al. (23 additional authors not shown)

    Abstract: Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent re… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  38. arXiv:2512.12008  [pdf, ps, other

    cs.CL cs.AI cs.PF

    Hold Onto That Thought: Assessing KV Cache Compression On Reasoning

    Authors: Minghui Liu, Aadi Palnitkar, Tahseen Rabbani, Hyunwoo Jae, Kyle Rui Sang, Dixi Yao, Shayan Shabihi, Fuheng Zhao, Tian Li, Ce Zhang, Furong Huang, Kunpeng Zhang

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on long-context tasks, but are often bottlenecked by memory constraints. Namely, the KV cache, which is used to significantly speed up attention computations, grows linearly with context length. A suite of compression algorithms has been introduced to alleviate cache growth by evicting unimportant tokens. However, several popula… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  39. arXiv:2512.11743  [pdf, ps, other

    cs.NE cs.AI

    CogniSNN: Enabling Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability with Random Graph Architectures in Spiking Neural Networks

    Authors: Yongsheng Huang, Peibo Duan, Yujie Wu, Kai Sun, Zhipeng Liu, Changsheng Zhang, Bin Zhang, Mingkun Xu

    Abstract: Spiking neural networks (SNNs), regarded as the third generation of artificial neural networks, are expected to bridge the gap between artificial intelligence and computational neuroscience. However, most mainstream SNN research directly adopts the rigid, chain-like hierarchical architecture of traditional artificial neural networks (ANNs), ignoring key structural characteristics of the brain. Bio… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  40. arXiv:2512.10766  [pdf, ps, other

    cs.CR cs.AI cs.CV

    Metaphor-based Jailbreaking Attacks on Text-to-Image Models

    Authors: Chenyu Zhang, Yiwen Ma, Lanjun Wang, Wenhui Li, Yi Tu, An-An Liu

    Abstract: Text-to-image~(T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreaking attacks have shown that adversarial prompts can effectively bypass these mechanisms and induce T2I models to produce sensitive content, revealing critical safety vulnerabilities. However, existing attack methods implicitly assume that the attacker kno… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

    Comments: This paper includes model-generated content that may contain offensive or distressing material

  41. arXiv:2512.10576  [pdf, ps, other

    cs.DC

    ESS: An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp

    Authors: Xinhang Chen, Chao Zhang, Jiahuan He, Wei Liu, Jianming Zhang, Wenlong Zhou, Xiao Li, Pai Zeng, Shiyong Li, Yuanpan Qian, Dong Li, Zhaogeng Li

    Abstract: DeepSeek-V3.2-Exp introduces a sparse attention mechanism that significantly reduces inference latency in long-context scenarios. Although the overall throughput has improved greatly, the Decode-stage of PD disaggregation remains to be a major bottleneck. This bottleneck primarily stems from the conflict between linear growth of Latent-Cache with sequence length and the limited GPU memory capacity… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  42. arXiv:2512.10394  [pdf, ps, other

    cs.RO cs.LG

    RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

    Authors: Weifan Guan, Huasen Xi, Chenxiao Zhang, Aosheng Li, Qinghao Hu, Jian Cheng

    Abstract: Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference acceleration. To overcome these limitations, we propose RoboNeuron, a universal deployment framework for embodied intelligence. RoboNeuron is the first framework to deeply integrate the cognitive capabilities of Large La… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  43. arXiv:2512.09504  [pdf, ps, other

    cs.SD

    DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance

    Authors: Kang Yin, Chunyu Qiang, Sirui Zhao, Xiaopeng Wang, Yuzhe Liang, Pengfei Cai, Tong Xu, Chen Zhang, Enhong Chen

    Abstract: Controllable text-to-speech (TTS) systems face significant challenges in achieving independent manipulation of speaker timbre and speaking style, often suffering from entanglement between these attributes. We present DMP-TTS, a latent Diffusion Transformer (DiT) framework with explicit disentanglement and multi-modal prompting. A CLAP-based style encoder (Style-CLAP) aligns cues from reference aud… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

  44. arXiv:2512.09315  [pdf, ps, other

    cs.CV

    Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook

    Authors: Yuan Ma, Junlin Hou, Chao Zhang, Yukun Zhou, Zongyuan Ge, Haoran Xie, Lie Ju

    Abstract: Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introd… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  45. arXiv:2512.08924  [pdf, ps, other

    cs.CV

    Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

    Authors: Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, Ignacio Rocco, Liliane Momeni, Junyu Xie, Shuyang Sun, Rahul Sukthankar, Joƫlle K. Barral, Raia Hadsell, Zoubin Ghahramani, Andrew Zisserman, Junlin Zhang, Mehdi S. M. Sajjadi

    Abstract: Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper introduces D4RT, a simple yet powerful feedforward model designed to efficiently solve this task. D4RT utilizes a unified transformer architecture to jointly infer depth, spatio-temporal correspondence, and full camera parameters from a single… ▽ More

    Submitted 10 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

    Comments: Project Page: https://d4rt-paper.github.io/

  46. arXiv:2512.08674  [pdf, ps, other

    cs.AI cs.MA

    Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology

    Authors: Rongzhao Zhang, Junqiao Wang, Shuyun Yang, Mouxiao Bian, Chao Ding, Yuwei Bai, Chihao Zhang, Yuguang Shen, Lei Wang, Lei Zheng, Qiujuan Yan, Yun Zhong, Meiling Liu, Jiwei Yu, Zheng Wang, Jie Xu, Meng Luo

    Abstract: Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous me… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  47. arXiv:2512.07890  [pdf, ps, other

    cs.MA cs.LG stat.ME stat.ML

    CrowdLLM: Building LLM-Based Digital Populations Augmented with Generative Models

    Authors: Ryan Feng Lin, Keyu Tian, Hanming Zheng, Congjing Zhang, Li Zeng, Shuai Huang

    Abstract: The emergence of large language models (LLMs) has sparked much interest in creating LLM-based digital populations that can be applied to many applications such as social simulation, crowdsourcing, marketing, and recommendation systems. A digital population can reduce the cost of recruiting human participants and alleviate many concerns related to human subject study. However, research has found th… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  48. arXiv:2512.07873  [pdf, ps, other

    cs.LG cs.AI

    Advancing physiological time series reconstruction and imputation via mixture of receptive fields and experts fusion

    Authors: Ci Zhang, Huayu Li, Changdi Yang, Jiangnan Xia, Yanzhi Wang, Xiaolong Ma, Jin Lu, Ao Li, Geng Yuan

    Abstract: Recent studies show that using diffusion models for time series signal reconstruction holds great promise. However, such approaches remain largely unexplored in the domain of medical time series. The unique characteristics of the physiological time series signals, such as multivariate, high temporal variability, highly noisy, and artifact-prone, make deep learning-based approaches still challengin… ▽ More

    Submitted 12 December, 2025; v1 submitted 26 November, 2025; originally announced December 2025.

  49. arXiv:2512.07802  [pdf, ps, other

    cs.CV

    OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

    Authors: Zhaochong An, Menglin Jia, Haonan Qiu, Zijian Zhou, Xiaoke Huang, Zhiheng Liu, Weiming Ren, Kumara Kahatapitiya, Ding Liu, Sen He, Chenyang Zhang, Tao Xiang, Fanny Yang, Serge Belongie, Tian Xie

    Abstract: Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrative. However, existing multi-shot video generation (MSV) methods struggle to effectively model long-range cross-shot context, as they rely on limited temporal windows or single keyframe conditioning, leading to degraded performance under com… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

    Comments: Project Page: https://zhaochongan.github.io/projects/OneStory

  50. arXiv:2512.07783  [pdf, ps, other

    cs.CL

    On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

    Authors: Charlie Zhang, Graham Neubig, Xiang Yue

    Abstract: Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during pre-training. A central challenge is the lack of control in modern training pipelines: large-scale pre-training corpora are opaque, mid-training is often underexamined,… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.