Skip to main content

Showing 1–50 of 1,514 results for author: Zhang, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19546  [pdf, ps, other

    cs.CV

    ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars

    Authors: Ziqiao Peng, Yi Chen, Yifeng Ma, Guozhen Zhang, Zhiyao Sun, Zixiang Zhou, Youliang Zhang, Zhengguang Zhou, Zhaoxin Fan, Hongyan Liu, Yuan Zhou, Qinglin Lu, Jun He

    Abstract: Despite significant advances in talking avatar generation, existing methods face critical challenges: insufficient text-following capability for diverse actions, lack of temporal alignment between actions and audio content, and dependency on additional control signals such as pose skeletons. We present ActAvatar, a framework that achieves phase-level precision in action control through textual gui… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: Project Page: https://ziqiaopeng.github.io/ActAvatar/

  2. arXiv:2512.19438  [pdf, ps, other

    cs.CV cs.AI

    MT-Mark: Rethinking Image Watermarking via Mutual-Teacher Collaboration with Adaptive Feature Modulation

    Authors: Fei Ge, Ying Huang, Jie Liu, Guixuan Zhang, Zhi Zeng, Shuwu Zhang, Hu Guan

    Abstract: Existing deep image watermarking methods follow a fixed embedding-distortion-extraction pipeline, where the embedder and extractor are weakly coupled through a final loss and optimized in isolation. This design lacks explicit collaboration, leaving no structured mechanism for the embedder to incorporate decoding-aware cues or for the extractor to guide embedding during training. To address this ar… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  3. arXiv:2512.19424  [pdf, ps, other

    cs.CL

    CodeSimpleQA: Scaling Factuality in Code Large Language Models

    Authors: Jian Yang, Wei Zhang, Yizhi Li, Shawn Guo, Haowen Wang, Aishan Liu, Ge Zhang, Zili Wang, Zhoujun Li, Xianglong Liu, Weifeng Lv

    Abstract: Large language models (LLMs) have made significant strides in code generation, achieving impressive capabilities in synthesizing code snippets from natural language instructions. However, a critical challenge remains in ensuring LLMs generate factually accurate responses about programming concepts, technical implementations, etc. Most previous code-related benchmarks focus on code execution correc… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  4. arXiv:2512.19095  [pdf, ps, other

    cs.CV

    Mamba-Based Modality Disentanglement Network for Multi-Contrast MRI Reconstruction

    Authors: Weiyi Lyu, Xinming Fang, Jun Wang, Jun Shi, Guixu Zhang, Juncheng Li

    Abstract: Magnetic resonance imaging (MRI) is a cornerstone of modern clinical diagnosis, offering unparalleled soft-tissue contrast without ionizing radiation. However, prolonged scan times remain a major barrier to patient throughput and comfort. Existing accelerated MRI techniques often struggle with two key challenges: (1) failure to effectively utilize inherent K-space prior information, leading to per… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: 12 pages, 11 figures, 6 tables

  5. arXiv:2512.18766  [pdf, ps, other

    cs.CV

    MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation

    Authors: Guohui Zhang, Hu Yu, Xiaoxiao Ma, Yaning Pan, Hang Xu, Feng Zhao

    Abstract: Reinforcement learning (RL) has demonstrated significant potential for post-training language models and autoregressive visual generative models, but adapting RL to masked generative models remains challenging. The core factor is that policy optimization requires accounting for the probability likelihood of each step due to its multi-step and iterative refinement process. This reliance on entire s… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: Code is available at https://github.com/zghhui/MaskFocus

  6. arXiv:2512.18746  [pdf, ps, other

    cs.CL cs.MA

    MemEvolve: Meta-Evolution of Agent Memory Systems

    Authors: Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, Shuicheng Yan

    Abstract: Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constra… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

  7. arXiv:2512.17183  [pdf, ps, other

    cs.RO

    Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots

    Authors: Gang Zhang

    Abstract: We present an innovative end-to-end framework for synthesizing semantically meaningful co-speech gestures and deploying them in real-time on a humanoid robot. This system addresses the challenge of creating natural, expressive non-verbal communication for robots by integrating advanced gesture generation techniques with robust physical control. Our core innovation lies in the meticulous integratio… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  8. arXiv:2512.14710  [pdf, ps, other

    cs.LG cs.AI

    Autonomous Source Knowledge Selection in Multi-Domain Adaptation

    Authors: Keqiuyin Li, Jie Lu, Hua Zuo, Guangquan Zhang

    Abstract: Unsupervised multi-domain adaptation plays a key role in transfer learning by leveraging acquired rich source information from multiple source domains to solve target task from an unlabeled target domain. However, multiple source domains often contain much redundant or unrelated information which can harm transfer performance, especially when in massive-source domain settings. It is urgent to deve… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  9. arXiv:2512.14126  [pdf, ps, other

    cs.CV

    Consistent Instance Field for Dynamic Scene Understanding

    Authors: Junyi Wu, Van Nguyen Nguyen, Benjamin Planche, Jiachen Tao, Changchang Sun, Zhongpai Gao, Zhenghao Zhao, Anwesa Choudhuri, Gengyu Zhang, Meng Zheng, Feiran Wang, Terrence Chen, Yan Yan, Ziyan Wu

    Abstract: We introduce Consistent Instance Field, a continuous and probabilistic spatio-temporal representation for dynamic scene understanding. Unlike prior methods that rely on discrete tracking or view-dependent features, our approach disentangles visibility from persistent object identity by modeling each space-time point with an occupancy probability and a conditional instance distribution. To realize… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  10. arXiv:2512.13564  [pdf, ps, other

    cs.CL cs.AI

    Memory in the Age of AI Agents

    Authors: Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu , et al. (22 additional authors not shown)

    Abstract: Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the prol… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  11. arXiv:2512.12730  [pdf, ps, other

    cs.CL

    NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

    Authors: Jingzhe Ding, Shengda Long, Changxin Pu, Huan Zhou, Hongwan Gao, Xiang Gao, Chao He, Yue Hou, Fei Hu, Zhaojian Li, Weiran Shi, Zaiyuan Wang, Daoguang Zan, Chenchen Zhang, Xiaoxu Zhang, Qizhi Chen, Xianfu Cheng, Bo Deng, Qingshui Gu, Kai Hua, Juntao Lin, Pai Liu, Mingchen Li, Xuanguang Pan, Zifan Peng , et al. (23 additional authors not shown)

    Abstract: Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent re… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  12. arXiv:2512.12459  [pdf, ps, other

    cs.CV cs.GR

    From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields

    Authors: Jiachen Tao, Benjamin Planche, Van Nguyen Nguyen, Junyi Wu, Yuchun Liu, Haoxuan Wang, Zhongpai Gao, Gengyu Zhang, Meng Zheng, Feiran Wang, Anwesa Choudhuri, Zhenghao Zhao, Weitai Kang, Terrence Chen, Yan Yan, Ziyan Wu

    Abstract: Accurately modeling light transport is essential for realistic image synthesis. Photon mapping provides physically grounded estimates of complex global illumination effects such as caustics and specular-diffuse interactions, yet its per-view radiance estimation remains computationally inefficient when rendering multiple views of the same scene. The inefficiency arises from independent photon traci… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  13. arXiv:2512.12196  [pdf, ps, other

    cs.MM cs.CV cs.SD eess.AS

    AutoMV: An Automatic Multi-Agent System for Music Video Generation

    Authors: Xiaoxuan Tang, Xinping Lei, Chaoran Zhu, Shiyun Chen, Ruibin Yuan, Yizhi Li, Changjae Oh, Ge Zhang, Wenhao Huang, Emmanouil Benetos, Yang Liu, Jiaheng Liu, Yinghao Ma

    Abstract: Music-to-Video (M2V) generation for full-length songs faces significant challenges. Existing methods produce short, disjointed clips, failing to align visuals with musical structure, beats, or lyrics, and lack temporal consistency. We propose AutoMV, a multi-agent system that generates full music videos (MVs) directly from a song. AutoMV first applies music processing tools to extract musical attr… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  14. arXiv:2512.11998  [pdf, ps, other

    cs.CL

    Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models

    Authors: Glenn Zhang, Treasure Mayowa, Jason Fan, Yicheng Fu, Aaron Sandoval, Sean O'Brien, Kevin Zhu

    Abstract: Producing trustworthy and reliable Large Language Models (LLMs) has become increasingly important as their usage becomes more widespread. Calibration seeks to achieve this by improving the alignment between the model's confidence and the actual likelihood of its responses being correct or desirable. However, it has been observed that the internal confidence of a model, derived from token probabili… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: Accepted at ACL 2025 SRW, 5 pages body, 14 pages total

  15. arXiv:2512.10386  [pdf

    cs.CV

    Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method

    Authors: Ge Zhang, Chunyang Wang, Bo Xiao, Xuelian Liu, Bin Liu

    Abstract: High-quality point cloud data is a critical foundation for tasks such as autonomous driving and 3D reconstruction. However, LiDAR-based point cloud acquisition is often affected by various disturbances, resulting in a large number of noise points that degrade the accuracy of subsequent point cloud object detection and recognition. Moreover, existing point cloud denoising methods typically sacrific… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  16. arXiv:2512.10382  [pdf, ps, other

    cs.SD

    Investigating training objective for flow matching-based speech enhancement

    Authors: Liusha Yang, Ziru Ge, Gui Zhang, Junan Zhang, Zhizheng Wu

    Abstract: Speech enhancement(SE) aims to recover clean speech from noisy recordings. Although generative approaches such as score matching and Schrodinger bridge have shown strong effectiveness, they are often computationally expensive. Flow matching offers a more efficient alternative by directly learning a velocity field that maps noise to data. In this work, we present a systematic study of flow matching… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  17. arXiv:2512.09524  [pdf, ps, other

    q-bio.NC cs.AI cs.LG eess.SP

    NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization

    Authors: Gaorui Zhang, Zhizhang Yuan, Jialan Yang, Junru Chen, Li Meng, Yang Yang

    Abstract: Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing and deep learning methods to enhance neural decoding performance. However, the in-depth exploration of model architectures remains underexplored, despite its proven effectiveness in other tasks such as energy for… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

  18. arXiv:2512.08802  [pdf, ps, other

    cs.CR cs.AI

    Democratizing ML for Enterprise Security: A Self-Sustained Attack Detection Framework

    Authors: Sadegh Momeni, Ge Zhang, Birkett Huber, Hamza Harkous, Sam Lipton, Benoit Seguin, Yanis Pavlidis

    Abstract: Despite advancements in machine learning for security, rule-based detection remains prevalent in Security Operations Centers due to the resource intensiveness and skill gap associated with ML solutions. While traditional rule-based methods offer efficiency, their rigidity leads to high false positives or negatives and requires continuous manual maintenance. This paper proposes a novel, two-stage h… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: published in CAMLIS 2025, https://www.camlis.org/

  19. arXiv:2512.07647  [pdf, ps, other

    cs.LG cs.AI

    A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance

    Authors: Georgios Tzachristas, Lei Deng, Ioannis Tzachristas, Gong Zhang, Renhai Chen

    Abstract: We develop a unified mathematical framework for certified Top-$k$ attention truncation that quantifies approximation error at both the distribution and output levels. For a single attention distribution $P$ and its Top-$k$ truncation $\hat P$, we show that the total-variation distance coincides with the discarded softmax tail mass and satisfies… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  20. arXiv:2512.07515  [pdf, ps, other

    cs.CL cs.AI

    SPAD: Seven-Source Token Probability Attribution with Syntactic Aggregation for Detecting Hallucinations in RAG

    Authors: Pengqian Lu, Jie Lu, Anjin Liu, Guangquan Zhang

    Abstract: Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge (stored in FFNs) and retrieved context. However, this perspective is incomplete, failing to account for the impact of other components in the generative process, such as the user query, previously generated tokens, the curre… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  21. arXiv:2512.05546  [pdf, ps, other

    cs.CV cs.AI

    Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models

    Authors: Weijue Bu, Guan Yuan, Guixian Zhang

    Abstract: Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding strategies intervene only at the output logits and thus cannot correct internal reasoning drift, while recent internal-control methods based on heuristic head suppression or global steering vectors lack principle… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

    Comments: 6 pages, 6 figures

    ACM Class: I.2.10; I.2.6

  22. arXiv:2512.04522  [pdf, ps, other

    cs.CV

    Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification

    Authors: Guoqing Zhang, Zhun Wang, Hairui Wang, Zhonglin Ye, Yuhui Zheng

    Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

    Comments: 14 pages, 7 figures

  23. arXiv:2512.03775  [pdf, ps, other

    cs.CR

    "MCP Does Not Stand for Misuse Cryptography Protocol": Uncovering Cryptographic Misuse in Model Context Protocol at Scale

    Authors: Biwei Yan, Yue Zhang, Minghui Xu, Hao Wu, Yechao Zhang, Kun Li, Guoming Zhang, Xiuzhen Cheng

    Abstract: The Model Context Protocol (MCP) is rapidly emerging as the middleware for LLM-based applications, offering a standardized interface for tool integration. However, its built-in security mechanisms are minimal: while schemas and declarations prevent malformed requests, MCP provides no guarantees of authenticity or confidentiality, forcing developers to implement cryptography themselves. Such ad hoc… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  24. arXiv:2512.02972  [pdf, ps, other

    cs.CV cs.RO

    BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

    Authors: Guowen Zhang, Chenhang He, Liyi Chen, Lei Zhang

    Abstract: Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because of the fundamental disparity in geometric accuracy between these sensors, indiscriminate fusion in previous methods often leads to degraded performance. In this paper, we propose BEVDilation, a novel LiDAR-centric framework that prioritize… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    Comments: Accept by AAAI26

  25. arXiv:2512.02580  [pdf, ps, other

    cs.CL

    From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

    Authors: Changpeng Yang, Jinyang Wu, Yuchen Liu, Shuai Zhang, Yang Li, Qiliang Liang, Hongzhen Wang, Shuai Nie, Jiaming Xu, Runyu Shi, Ying Huang, Guoquan Zhang

    Abstract: Reinforcement learning has emerged as a paradigm for post-training large language models, boosting their reasoning capabilities. Such approaches compute an advantage value for each sample, reflecting better or worse performance than expected, thereby yielding both positive and negative signals for training. However, the indiscriminate mixing of the two signals in existing methods, especially from… ▽ More

    Submitted 15 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: Accepted by AAAI 2026

  26. arXiv:2512.01978  [pdf, ps, other

    math.CO cs.DM

    Fault-tolerant mutual-visibility: complexity and solutions for grid-like networks

    Authors: Serafino Cicerone, Gabriele Di Stefano, Sandi Klavžar, Gang Zhang

    Abstract: Networks are often modeled using graphs, and within this setting we introduce the notion of $k$-fault-tolerant mutual visibility. Informally, a set of vertices $X \subseteq V(G)$ in a graph $G$ is a $k$-fault-tolerant mutual-visibility set ($k$-ftmv set) if any two vertices in $X$ are connected by a bundle of $k+1$ shortest paths such that: ($i$) each shortest path contains no other vertex of $X$,… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: 25 pages, 3 figure, 1 table

    MSC Class: 05C12; 05C69; 05C76; 68Q17

  27. arXiv:2512.01444  [pdf, ps, other

    cs.CV

    FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation

    Authors: Jian Shu, Nanjie Yao, Gangjian Zhang, Junlong Ren, Yu Feng, Hao Wang

    Abstract: 3D human avatar animation aims at transforming a human avatar from an arbitrary initial pose to a specified target pose using deformation algorithms. Existing approaches typically divide this task into two stages: canonical template construction and target pose deformation. However, current template construction methods demand extensive skeletal rigging and often produce artifacts for specific pos… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: 9 pages,4 figures

  28. arXiv:2512.01410  [pdf, ps, other

    cs.CL

    DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis

    Authors: Ruohan Zhou, Jiachen Yuan, Churui Yang, Wenzheng Huang, Guoyan Zhang, Shiyao Wei, Jiazhen Hu, Ning Xin, Md Maruf Hasan

    Abstract: Understanding sentiment in complex textual expressions remains a fundamental challenge in affective computing. To address this, we propose a Dynamic Fusion Learning Model (DyFuLM), a multimodal framework designed to capture both hierarchical semantic representations and fine-grained emotional nuances. DyFuLM introduces two key moodules: a Hierarchical Dynamic Fusion module that adaptively integrat… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: 8 pages, 6 figures, preprint. Under review for a suitable AI conference

  29. EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly

    Authors: Xiaokun Pan, Zhenzhe Li, Zhichao Ye, Hongjia Zhai, Guofeng Zhang

    Abstract: Real-time 3D reconstruction is a fundamental task in computer graphics. Recently, differentiable-rendering-based SLAM system has demonstrated significant potential, enabling photorealistic scene rendering through learnable scene representations such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Current differentiable rendering methods face dual challenges in real-time computat… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: SIGGRAPH ASIA 2025

  30. arXiv:2511.23172  [pdf, ps, other

    cs.CV

    Fast Multi-view Consistent 3D Editing with Video Priors

    Authors: Liyi Chen, Ruihuang Li, Guowen Zhang, Pengfei Wang, Lei Zhang

    Abstract: Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employing 2D generation or editing models to process each view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to over-smoothed results because… ▽ More

    Submitted 1 December, 2025; v1 submitted 28 November, 2025; originally announced November 2025.

    Comments: accepted by AAAI2026

  31. arXiv:2511.21724  [pdf

    cs.CL

    AD-CDO: A Lightweight Ontology for Representing Eligibility Criteria in Alzheimer's Disease Clinical Trials

    Authors: Zenan Sun, Rashmie Abeysinghe, Xiaojin Li, Xinyue Hu, Licong Cui, Guo-Qiang Zhang, Jiang Bian, Cui Tao

    Abstract: Objective This study introduces the Alzheimer's Disease Common Data Element Ontology for Clinical Trials (AD-CDO), a lightweight, semantically enriched ontology designed to represent and standardize key eligibility criteria concepts in Alzheimer's disease (AD) clinical trials. Materials and Methods We extracted high-frequency concepts from more than 1,500 AD clinical trials on ClinicalTrials… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

  32. arXiv:2511.21579  [pdf, ps, other

    cs.CV

    Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy

    Authors: Teng Hu, Zhentao Yu, Guozhen Zhang, Zihan Su, Zhengguang Zhou, Youliang Zhang, Yuan Zhou, Qinglin Lu, Ran Yi

    Abstract: The synthesis of synchronized audio-visual content is a key challenge in generative AI, with open-source models facing challenges in robust audio-video alignment. Our analysis reveals that this issue is rooted in three fundamental challenges of the joint diffusion process: (1) Correspondence Drift, where concurrently evolving noisy latents impede stable learning of alignment; (2) inefficient globa… ▽ More

    Submitted 28 November, 2025; v1 submitted 26 November, 2025; originally announced November 2025.

  33. arXiv:2511.21541  [pdf, ps, other

    cs.CV

    Video Generation Models Are Good Latent Reward Models

    Authors: Xiaoyue Mi, Wenqing Yu, Jiesong Lian, Shibo Jie, Ruizhe Zhong, Zijun Liu, Guozhen Zhang, Zixiang Zhou, Zhiyong Xu, Yuan Zhou, Qinglin Lu, Fan Tang

    Abstract: Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space app… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  34. arXiv:2511.21394  [pdf, ps, other

    cs.IR cs.AI

    RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction

    Authors: Guoxiao Zhang, Tan Qu, Ao Li, DongLin Ni, Qianlong Xie, Xingxing Wang

    Abstract: Reranking improves recommendation quality by modeling item interactions. However, existing methods often decouple ranking and reranking, leading to weak listwise evaluation models that suffer from combinatorial sparsity and limited representational power under strict latency constraints. In this paper, we propose RIA (Ranking-Infused Architecture), a unified, end-to-end framework that seamlessly i… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  35. arXiv:2511.21389  [pdf, ps, other

    cs.IR cs.AI

    FITRep: Attention-Guided Item Representation via MLLMs

    Authors: Guoxiao Zhang, Ao Li, Tan Qu, Qianlong Xie, Xingxing Wang

    Abstract: Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal embedding, existing methods treat representations as black boxes, ignoring structural relationships (e.g., primary vs. auxiliary elements), leading to local structural collapse problem. To address this, inspired… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  36. arXiv:2511.21156  [pdf, ps, other

    cs.NI

    Digital Twin-Driven Secure Access Strategy for SAGIN-Enabled IoT Networks

    Authors: Hui Liang, Zhihui Wu, Runqi Yuan, Guobin Zhang, Yanfeng Zhang, Jinkai Zheng, Tom H. Luan

    Abstract: In space-air-ground integrated networks (SAGIN)-enabled IoT networks, secure access has become a significant challenge due to the increasing risks of eavesdropping attacks. To address these threats to data confidentiality, this paper proposes a Digital Twin (DT)-driven secure access strategy. The strategy leverages a virtual replica of the physical SAGIN environment within the DT framework to cont… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  37. arXiv:2511.20922  [pdf, ps, other

    cs.CR cs.DC cs.LG

    Readout-Side Bypass for Residual Hybrid Quantum-Classical Models

    Authors: Guilin Zhang, Wulan Guo, Ziqi Tan, Hongyang He, Qiang Guan, Hailong Jiang

    Abstract: Quantum machine learning (QML) promises compact and expressive representations, but suffers from the measurement bottleneck - a narrow quantum-to-classical readout that limits performance and amplifies privacy risk. We propose a lightweight residual hybrid architecture that concatenates quantum features with raw inputs before classification, bypassing the bottleneck without increasing quantum comp… ▽ More

    Submitted 26 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

    Comments: 5 pages, 1 figure, 6 tables

    MSC Class: 68T05; 81P68 ACM Class: I.2.6; C.2.4; K.4.1

  38. arXiv:2511.20648  [pdf, ps, other

    cs.CV

    LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight

    Authors: Yunze Man, Shihao Wang, Guowen Zhang, Johan Bjorck, Zhiqi Li, Liang-Yan Gui, Jim Fan, Jan Kautz, Yu-Xiong Wang, Zhiding Yu

    Abstract: To act in the world, a model must name what it sees and know where it is in 3D. Today's vision-language models (VLMs) excel at open-ended 2D description and grounding, yet multi-object 3D detection remains largely missing from the VLM toolbox. We present LocateAnything3D, a VLM-native recipe that casts 3D detection as a next-token prediction problem. The key is a short, explicit Chain-of-Sight (Co… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: Tech report. Project page: https://nvlabs.github.io/LocateAnything3D/

  39. arXiv:2511.19812  [pdf, ps, other

    cs.IT

    Two-Step Decoding of Binary $2\times2$ Sum-Rank-Metric Codes

    Authors: Hao Wu, Bocong Chen, Guanghui Zhang, Hongwei Liu

    Abstract: We resolve an open problem posed by Chen--Cheng--Qi (IEEE Trans.\ Inf.\ Theory, 2025): can decoding of binary sum-rank-metric codes $\SR(C_1,C_2)$ with $2\times2$ matrix blocks be reduced entirely to decoding the constituent Hamming-metric codes $C_1$ and $C_2$ without the additional requirement $d_1\ge\tfrac{2}{3}d_{\mathrm{sr}}$ that underlies their fast decoder? We answer this in the affirmativ… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: 16 pages

    MSC Class: 94B05; 94B35

  40. arXiv:2511.19033  [pdf, ps, other

    cs.CV

    ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

    Authors: Gengyuan Zhang, Mingcong Ding, Jingpei Wu, Ruotong Liao, Volker Tresp

    Abstract: Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments: (i) they rely on profound but stale pre-trained k… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: 8 main pages plus 13 pages Appendix

  41. arXiv:2511.18538  [pdf, ps, other

    cs.SE cs.CL

    From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

    Authors: Jian Yang, Xianglong Liu, Weifeng Lv, Ken Deng, Shawn Guo, Lin Jing, Yizhi Li, Shark Liu, Xianzhen Luo, Yuyu Luo, Changzai Pan, Ensheng Shi, Yingshui Tan, Renshuai Tao, Jiajun Wu, Xianjie Wu, Zhenhe Wu, Daoguang Zan, Chenchen Zhang, Wei Zhang, He Zhu, Terry Yue Zhuo, Kerui Cao, Xianfu Cheng, Jun Dong , et al. (46 additional authors not shown)

    Abstract: Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-b… ▽ More

    Submitted 6 December, 2025; v1 submitted 23 November, 2025; originally announced November 2025.

  42. arXiv:2511.17904  [pdf, ps, other

    cs.CV cs.RO

    CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation

    Authors: Yuhang Ming, Chenxin Fang, Xingyuan Yu, Fan Zhang, Weichen Dai, Wanzeng Kong, Guofeng Zhang

    Abstract: Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

    Comments: 15 pages, 8 figures, 4 tables

  43. arXiv:2511.17512  [pdf, ps, other

    cs.HC cs.CY

    First Contact with Dark Patterns and Deceptive Designs in Chinese and Japanese Free-to-Play Mobile Games

    Authors: Gloria Xiaodan Zhang, Yijia Wang, Taro Leo Nakajima, Katie Seaborn

    Abstract: Mobile games have gained immense popularity due to their accessibility, allowing people to play anywhere, anytime. Dark patterns and deceptive designs (DPs) have been found in these and other gaming platforms within certain cultural contexts. Here, we explored DPs in the onboarding experiences of free-to-play mobile games from China and Japan. We identified several unique patterns and mapped their… ▽ More

    Submitted 6 October, 2025; originally announced November 2025.

    Comments: CHI PLAY '25

    Journal ref: Proceedings of the ACM on Human-Computer Interaction, Volume 9, Issue 6, Article No. GAMES025, Pages 730-755 (2025)

  44. arXiv:2511.17185  [pdf, ps, other

    cs.CV

    PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

    Authors: Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Haomin Liu, Guofeng Zhang

    Abstract: We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strategies; such suboptimal designs not only limit camera control precision but also result in generated videos that fail to preserve fine visual details from the sour… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  45. arXiv:2511.17123  [pdf, ps, other

    cs.AR cs.LG

    Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration

    Authors: Jiaxun Fang, Grace Li Zhang, Shaoyi Huang

    Abstract: Systolic array accelerators execute CNNs with energy dominated by the switching activity of multiply accumulate (MAC) units. Although prior work exploits weight dependent MAC power for compression, existing methods often use global activation models, coarse energy proxies, or layer-agnostic policies, which limits their effectiveness on real hardware. We propose an energy aware, layer-wise compress… ▽ More

    Submitted 16 December, 2025; v1 submitted 21 November, 2025; originally announced November 2025.

  46. arXiv:2511.16766  [pdf, ps, other

    cs.CV

    SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG

    Authors: Mengnan Jiang, Zhaolin Sun, Christian Franke, Michele Franco Adesso, Antonio Haas, Grace Li Zhang

    Abstract: Scalable Vector Graphics (SVGs) are central to modern design workflows, offering scaling without distortion and precise editability. However, for single object SVGs, generating multi-view consistent SVGs from a single-view input remains underexplored. We present a three stage framework that produces multi-view SVGs with geometric and color consistency from a single SVG input. First, the rasterized… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 10 pages, 4 figures. Preprint

  47. arXiv:2511.16395  [pdf, ps, other

    cs.AI cs.PL cs.SE eess.SY

    CorrectHDL: Agentic HDL Design with LLMs Leveraging High-Level Synthesis as Reference

    Authors: Kangwei Xu, Grace Li Zhang, Ulf Schlichtmann, Bing Li

    Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in hardware front-end design using hardware description languages (HDLs). However, their inherent tendency toward hallucination often introduces functional errors into the generated HDL designs. To address this issue, we propose the framework CorrectHDL that leverages high-level synthesis (HLS) results as functional references to… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 7 pages, 15 figures, 2 tables

  48. arXiv:2511.16013  [pdf, ps, other

    cs.LG cs.AI

    Physics-Guided Inductive Spatiotemporal Kriging for PM2.5 with Satellite Gradient Constraints

    Authors: Shuo Wang, Mengfan Teng, Yun Cheng, Lothar Thiele, Olga Saukh, Shuangshuang He, Yuanting Zhang, Jiang Zhang, Gangfeng Zhang, Xingyuan Yuan, Jingfang Fan

    Abstract: High-resolution mapping of fine particulate matter (PM2.5) is a cornerstone of sustainable urbanism but remains critically hindered by the spatial sparsity of ground monitoring networks. While traditional data-driven methods attempt to bridge this gap using satellite Aerosol Optical Depth (AOD), they often suffer from severe, non-random data missingness (e.g., due to cloud cover or nighttime) and… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  49. arXiv:2511.15915  [pdf, ps, other

    cs.LG cs.CL

    AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

    Authors: Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

    Abstract: We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encou… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  50. arXiv:2511.13626  [pdf, ps, other

    cs.AI

    CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product

    Authors: Kaiwen Xue, Chenglong Li, Zhonghong Ou, Guoxin Zhang, Kaoyan Lu, Shuai Lyu, Yifan Zhu, Ping Zong Junpeng Ding, Xinyu Liu, Qunlin Chen, Weiwei Qin, Yiran Shen, Jiayi Cen

    Abstract: Human-defined creativity is highly abstract, posing a challenge for multimodal large language models (MLLMs) to comprehend and assess creativity that aligns with human judgments. The absence of an existing benchmark further exacerbates this dilemma. To this end, we propose CreBench, which consists of two key components: 1) an evaluation benchmark covering the multiple dimensions from creative idea… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 13 pages, 3 figures,The 40th Annual AAAI Conference on Artificial Intelligence(AAAI 2026),Paper has been accepted for a poster presentation