Skip to main content

Showing 1–50 of 886 results for author: Fan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.21041  [pdf, ps, other

    cs.HC

    When LLMs fall short in Deductive Coding: Model Comparison and Human AI Collaboration Workflow Design

    Authors: Zijian Li, Luzhen Tang, Mengyu Xia, Xinyu Li, Naping Chen, Dragan Gašević, Yizhou Fan

    Abstract: With generative artificial intelligence driving the growth of dialogic data in education, automated coding is a promising direction for learning analytics to improve efficiency. This surge highlights the need to understand the nuances of student-AI interactions, especially those rare yet crucial. However, automated coding may struggle to capture these rare codes due to imbalanced data, while human… ▽ More

    Submitted 24 December, 2025; originally announced December 2025.

    Comments: 24 pages (8 pages for Appendix), 4 figures, for Learning Analytics & Knowledge Conference to be held in 2026, Norway (LAK26)

  2. arXiv:2512.20649  [pdf, ps, other

    cs.AI cs.CR

    AIAuditTrack: A Framework for AI Security system

    Authors: Zixun Luo, Yuhang Fan, Yufei Li, Youzhi Zhang, Hengyu Lin, Ziqi Wang

    Abstract: The rapid expansion of AI-driven applications powered by large language models has led to a surge in AI interaction data, raising urgent challenges in security, accountability, and risk traceability. This paper presents AiAuditTrack (AAT), a blockchain-based framework for AI usage traffic recording and governance. AAT leverages decentralized identity (DID) and verifiable credentials (VC) to establ… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  3. arXiv:2512.20561  [pdf, ps, other

    cs.CV

    FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

    Authors: Kaitong Cai, Jusheng Zhang, Jing Yang, Yijia Fan, Pengtao Xie, Jian Wang, Keze Wang

    Abstract: Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and substantial redundancy. Existing token reduction methods often ignore the textual query or rely on deep attention maps, whose instability under aggressive pruning leads to degraded semantic alignment. We propose FlashVLM, a text guided visu… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

    Comments: Under submission

  4. arXiv:2512.19150  [pdf, ps, other

    cs.CV

    AMap: Distilling Future Priors for Ahead-Aware Online HD Map Construction

    Authors: Ruikai Li, Xinrun Li, Mengwei Xie, Hao Shan, Shoumeng Qiu, Xinyuan Chang, Yizhe Fan, Feng Xiong, Han Jiang, Yilong Ren, Haiyang Yu, Mu Xu, Yang Long, Varun Ojha, Zhiyong Cui

    Abstract: Online High-Definition (HD) map construction is pivotal for autonomous driving. While recent approaches leverage historical temporal fusion to improve performance, we identify a critical safety flaw in this paradigm: it is inherently ``spatially backward-looking." These methods predominantly enhance map reconstruction in traversed areas, offering minimal improvement for the unseen road ahead. Cruc… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: 19 pages, 11 figures

  5. arXiv:2512.18623  [pdf, ps, other

    cs.CL cs.AI

    LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

    Authors: Jensen Zhang, Ningyuan Liu, Yijia Fan, Zihao Huang, Qinglin Zeng, Kaitong Cai, Jian Wang, Keze Wang

    Abstract: Large language models (LLMs) often generate hallucinated content that lacks factual or contextual grounding, limiting their reliability in critical applications. Existing approaches such as supervised fine-tuning and reinforcement learning from human feedback are data intensive and computationally expensive, while static parameter editing methods struggle with context dependent errors and catastro… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: Accepted at AAAI 2026

  6. arXiv:2512.18614  [pdf, ps, other

    cs.CV cs.AI

    PTTA: A Pure Text-to-Animation Framework for High-Quality Creation

    Authors: Ruiqi Chen, Kaitong Cai, Yijia Fan, Keze Wang

    Abstract: Traditional animation production involves complex pipelines and significant manual labor cost. While recent video generation models such as Sora, Kling, and CogVideoX achieve impressive results on natural video synthesis, they exhibit notable limitations when applied to animation generation. Recent efforts, such as AniSora, demonstrate promising performance by fine-tuning image-to-video models for… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: Under submission

  7. arXiv:2512.18264  [pdf, ps, other

    cs.CV cs.AI cs.CR

    Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

    Authors: Yucheng Fan, Jiawei Chen, Yu Tian, Zhaoxia Yin

    Abstract: As vision-language models (VLMs) become widely adopted, VLM-based attribute inference attacks have emerged as a serious privacy concern, enabling adversaries to infer private attributes from images shared on social media. This escalating threat calls for dedicated protection methods to safeguard user privacy. However, existing methods often degrade the visual quality of images or interfere with vi… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  8. arXiv:2512.18234  [pdf, ps, other

    cs.HC

    The Social Blindspot in Human-AI Collaboration: How Undetected AI Personas Reshape Team Dynamics

    Authors: Lixiang Yan, Xibin Han, Yu Zhang, Samuel Greiff, Inge Molenaar, Roberto Martinez-Maldonado, Yizhou Fan, Linxuan Zhao, Xinyu Li, Yueqiao Jin, Dragan Gašević

    Abstract: As generative AI systems become increasingly embedded in collaborative work, they are evolving from visible tools into human-like communicative actors that participate socially rather than merely providing information. Yet little is known about how such agents shape team dynamics when their artificial nature is not recognised, a growing concern as human-like AI is deployed at scale in education, o… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  9. arXiv:2512.15258  [pdf, ps, other

    cs.RO cs.AI

    VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments

    Authors: Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao

    Abstract: This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First… ▽ More

    Submitted 19 December, 2025; v1 submitted 17 December, 2025; originally announced December 2025.

  10. arXiv:2512.12623  [pdf, ps, other

    cs.CV cs.CL

    Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

    Authors: Chengzhi Liu, Yuzhe Yang, Yue Fan, Qingyue Wei, Sheng Liu, Xin Eric Wang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced cross-modal understanding and reasoning by incorporating Chain-of-Thought (CoT) reasoning in the semantic space. Building upon this, recent studies extend the CoT mechanism to the visual modality, enabling models to integrate visual information during reasoning through external tools or explicit image gener… ▽ More

    Submitted 17 December, 2025; v1 submitted 14 December, 2025; originally announced December 2025.

  11. arXiv:2512.12487  [pdf, ps, other

    cs.CV

    More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

    Authors: Hoang Anh Just, Yifei Fan, Handong Zhao, Jiuxiang Gu, Ruiyi Zhang, Simon Jenni, Kushal Kafle, Ruoxi Jia, Jing Shi

    Abstract: Reinforcement learning from verifiable rewards (RLVR) has recently been extended from text-only LLMs to vision-language models (VLMs) to elicit long-chain multimodal reasoning. However, RLVR-trained VLMs still exhibit two persistent failure modes: inaccurate visual extraction (missing or hallucinating details) and logically inconsistent chains-of-thought, largely because verifiable signals supervi… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  12. arXiv:2512.08905  [pdf, ps, other

    cs.CV

    Self-Evolving 3D Scene Generation from a Single Image

    Authors: Kaizhi Zheng, Yue Fan, Jing Gu, Zishuo Xu, Xuehai He, Xin Eric Wang

    Abstract: Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover reasonable geometry from single views, but their object-centric training limits generalization to complex, large-scale scenes with faithful structure and texture. We present EvoScene, a self-evolving, training-free framework that progressivel… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  13. arXiv:2512.08240  [pdf, ps, other

    cs.CV cs.AI

    HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

    Authors: Jusheng Zhang, Xiaoyang Guo, Kaitong Cai, Qinhan Lv, Yijia Fan, Wenhao Chai, Jian Wang, Keze Wang

    Abstract: Vision-language models (VLMs) have transformed multimodal reasoning, but feeding hundreds of visual patch tokens into LLMs incurs quadratic computational costs, straining memory and context windows. Traditional approaches face a trade-off: continuous compression dilutes high-level semantics such as object identities, while discrete quantization loses fine-grained details such as textures. We intro… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  14. arXiv:2512.08228  [pdf, ps, other

    cs.CV cs.AI

    MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

    Authors: Jusheng Zhang, Kaitong Cai, Xiaoyang Guo, Sidi Liu, Qinhan Lv, Ruiqi Chen, Jing Yang, Yijia Fan, Xiaofei Sun, Jian Wang, Ziliang Chen, Liang Lin, Keze Wang

    Abstract: The ability to perform Chain-of-Thought (CoT) reasoning marks a major milestone for multimodal models (MMs), enabling them to solve complex visual reasoning problems. Yet a critical question remains: is such reasoning genuinely grounded in visual evidence and logically coherent? Existing benchmarks emphasize generation but neglect verification, i.e., the capacity to assess whether a reasoning chai… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  15. arXiv:2512.06442  [pdf, ps, other

    cs.PL

    Nice to Meet You: Synthesizing Practical MLIR Abstract Transformers

    Authors: Xuanyu Peng, Dominic Kennedy, Yuyou Fan, Ben Greenman, John Regehr, Loris D'Antoni

    Abstract: Static analyses play a fundamental role during compilation: they discover facts that are true in all executions of the code being compiled, and then these facts are used to justify optimizations and diagnostics. Each static analysis is based on a collection of abstract transformers that provide abstract semantics for the concrete instructions that make up a program. It can be challenging to implem… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

  16. arXiv:2512.06275  [pdf, ps, other

    cs.CV

    FacePhys: State of the Heart Learning

    Authors: Kegang Wang, Jiankai Tang, Yuntao Wang, Xin Liu, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Daniel McDuff

    Abstract: Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradatio… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  17. arXiv:2512.06080  [pdf, ps, other

    cs.CV

    Shoot-Bounce-3D: Single-Shot Occlusion-Aware 3D from Lidar by Decomposing Two-Bounce Light

    Authors: Tzofi Klinghoffer, Siddharth Somasundaram, Xiaoyu Xiang, Yuchen Fan, Christian Richardt, Akshat Dave, Ramesh Raskar, Rakesh Ranjan

    Abstract: 3D scene reconstruction from a single measurement is challenging, especially in the presence of occluded regions and specular materials, such as mirrors. We address these challenges by leveraging single-photon lidars. These lidars estimate depth from light that is emitted into the scene and reflected directly back to the sensor. However, they can also measure light that bounces multiple times in t… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

    Comments: SIGGRAPH Asia 2025. Project page: https://shoot-bounce-3d.github.io

  18. arXiv:2512.06018  [pdf, ps, other

    cs.CY cs.AI

    Uncovering Students' Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining

    Authors: Jiameng Wei, Dinh Dang, Kaixun Yang, Emily Stokes, Amna Mazeh, Angelina Lim, David Wei Dai, Joel Moore, Yizhou Fan, Danijela Gasevic, Dragan Gasevic, Guanliang Chen

    Abstract: Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extensive data collection and learning analytics provide powerful methods for analyzing educational traces, these approaches remain largely underexplored in pharmacy clinical training. This study addresses this gap b… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  19. arXiv:2512.05483  [pdf, ps, other

    cs.LG

    Turbulence Regression

    Authors: Yingang Fan, Binjie Ding, Baiyi Chen

    Abstract: Air turbulence refers to the disordered and irregular motion state generated by drastic changes in velocity, pressure, or direction during airflow. Various complex factors lead to intricate low-altitude turbulence outcomes. Under current observational conditions, especially when using only wind profile radar data, traditional methods struggle to accurately predict turbulence states. Therefore, thi… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  20. arXiv:2512.02685  [pdf, ps, other

    cs.CV

    Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

    Authors: Huankun Sheng, Ming Li, Yixiang Wei, Yeying Fan, Yu-Hui Wen, Tieliang Gong, Yong-Jin Liu

    Abstract: Recent advances in object-centric representation learning have shown that slot attention-based methods can effectively decompose visual scenes into object slot representations without supervision. However, existing approaches typically process foreground and background regions indiscriminately, often resulting in background interference and suboptimal instance discovery performance on real-world d… ▽ More

    Submitted 10 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

  21. arXiv:2512.00831  [pdf, ps, other

    cs.LG

    ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

    Authors: Yuchen Zeng, Shuibai Zhang, Wonjun Kang, Shutong Wu, Lynnix Zou, Ying Fan, Heeju Kim, Ziqian Lin, Jungtaek Kim, Hyung Il Koo, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Large Reasoning Models (LRMs) are Large Language Models (LLMs) explicitly trained to generate long-form Chain-of-Thoughts (CoTs), achieving impressive success on challenging tasks like math and programming. However, their underlying reasoning "algorithms" remain poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree… ▽ More

    Submitted 9 December, 2025; v1 submitted 30 November, 2025; originally announced December 2025.

  22. arXiv:2512.00812  [pdf, ps, other

    cs.LG cs.AI

    Causal Invariance and Counterfactual Learning Driven Cooperative Game for Multi-Label Classification

    Authors: Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Keze Wang

    Abstract: Multi-label classification (MLC) remains vulnerable to label imbalance, spurious correlations, and distribution shifts, challenges that are particularly detrimental to rare label prediction. To address these limitations, we introduce the Causal Cooperative Game (CCG) framework, which conceptualizes MLC as a cooperative multi-player interaction. CCG unifies explicit causal discovery via Neural Stru… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  23. arXiv:2511.22186  [pdf, ps, other

    cs.SE

    Exploring the SECURITY.md in the Dependency Chain: Preliminary Analysis of the PyPI Ecosystem

    Authors: Chayanid Termphaiboon, Raula Gaikovina Kula, Youmei Fan, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, Thanwadee Sunetnanta, Kenichi Matsumoto

    Abstract: Security policies, such as SECURITY.md files, are now common in open-source projects. They help guide responsible vulnerability reporting and build trust among users and contributors. Despite their growing use, it is still unclear how these policies influence the structure and evolution of software dependencies. Software dependencies are external packages or libraries that a project relies on, and… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: 8 pages, 5 figures, accepted to ISE 2025 (International Workshop on Intelligent Software Engineering)

  24. arXiv:2511.22055  [pdf, ps, other

    cs.CV cs.MM

    OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

    Authors: Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung

    Abstract: Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

    Comments: 47 pages, 42 figures, 13 tables

  25. arXiv:2511.20719  [pdf, ps, other

    cs.AI cs.IT eess.SP

    Learning Multi-Access Point Coordination in Agentic AI Wi-Fi with Large Language Models

    Authors: Yifan Fan, Le Liang, Peng Liu, Xiao Li, Ziyang Guo, Qiao Lan, Shi Jin, Wen Tong

    Abstract: Multi-access point coordination (MAPC) is a key technology for enhancing throughput in next-generation Wi-Fi within dense overlapping basic service sets. However, existing MAPC protocols rely on static, protocol-defined rules, which limits their ability to adapt to dynamic network conditions such as varying interference levels and topologies. To address this limitation, we propose a novel Agentic… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  26. arXiv:2511.20233  [pdf, ps, other

    cs.CL

    REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

    Authors: Chuyi Kong, Gao Wei, Jing Ma, Hongzhan Lin, Yaxin Fan

    Abstract: The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, a… ▽ More

    Submitted 28 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

  27. arXiv:2511.19694  [pdf, ps, other

    cs.LG cs.AI

    TiCT: A Synthetically Pre-Trained Foundation Model for Time Series Classification

    Authors: Chin-Chia Michael Yeh, Uday Singh Saini, Junpeng Wang, Xin Dai, Xiran Fan, Jiarui Sun, Yujie Fan, Yan Zheng

    Abstract: The ubiquity of time series data creates a strong demand for general-purpose foundation models, yet developing them for classification remains a significant challenge, largely due to the high cost of labeled data. Foundation models capable of in-context learning (ICL) offer a powerful solution, adapting to new tasks with minimal examples and reducing the need for extensive retraining. However, pri… ▽ More

    Submitted 26 November, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

  28. arXiv:2511.19693  [pdf, ps, other

    cs.LG cs.AI

    TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding

    Authors: Chin-Chia Michael Yeh, Uday Singh Saini, Xin Dai, Xiran Fan, Shubham Jain, Yujie Fan, Jiarui Sun, Junpeng Wang, Menghai Pan, Yingtong Dou, Yuzhong Chen, Vineeth Rakesh, Liang Wang, Yan Zheng, Mahashweta Das

    Abstract: Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transacti… ▽ More

    Submitted 26 November, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

  29. arXiv:2511.18286  [pdf, ps, other

    cs.CV

    RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System

    Authors: Runwei Guan, Rongsheng Hu, Shangshu Chen, Ningyuan Xiao, Xue Xia, Jiayang Liu, Beibei Chen, Ziren Tang, Ningwei Ouyang, Shaofeng Liang, Yuxuan Fan, Wanjie Sun, Yutao Yue

    Abstract: Current roadside perception systems mainly focus on instance-level perception, which fall short in enabling interaction via natural language and reasoning about traffic behaviors in context. To bridge this gap, we introduce RoadSceneVQA, a large-scale and richly annotated visual question answering (VQA) dataset specifically tailored for roadside scenarios. The dataset comprises 34,736 diverse QA p… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

    Comments: 9 pages, 6 figures, accepted by AAAI 2026. The model is also called Dream, to the other me in the world forever

  30. arXiv:2511.16825  [pdf, ps, other

    cs.CV cs.AI

    WorldGen: From Text to Traversable and Interactive 3D Worlds

    Authors: Dilin Wang, Hyunyoung Jung, Tom Monnier, Kihyuk Sohn, Chuhang Zou, Xiaoyu Xiang, Yu-Ying Yeh, Di Liu, Zixuan Huang, Thu Nguyen-Phuoc, Yuchen Fan, Sergiu Oprea, Ziyan Wang, Roman Shapovalov, Nikolaos Sarafianos, Thibault Groueix, Antoine Toisoul, Prithviraj Dhar, Xiao Chu, Minghao Chen, Geon Yeong Park, Mahima Gupta, Yassir Azziz, Rakesh Ranjan, Andrea Vedaldi

    Abstract: We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts. Our approach transforms natural language descriptions into traversable, fully textured environments that can be immediately explored or edited within standard game engines. By combining LLM-driven scene layout reasoning, procedural generation, diffusion-based 3D gen… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

  31. arXiv:2511.14721  [pdf, ps, other

    cs.LG math.OC

    AdamHD: Decoupled Huber Decay Regularization for Language Model Pre-Training

    Authors: Fu-Ming Guo, Yingfang Fan

    Abstract: Adaptive optimizers with decoupled weight decay, such as AdamW, are the de facto standard for pre-training large transformer-based generative models. Yet the quadratic nature of the $\ell_2$ penalty embedded in weight decay drives all parameters toward the origin at the same rate, making the update vulnerable to rare but extreme gradient directions and often over-penalizing well-conditioned coordi… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: GPU-Accelerated and Scalable Optimization (ScaleOpt)

    MSC Class: 68Txx ACM Class: F.0; G.4

    Journal ref: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: GPU-Accelerated and Scalable Optimization (ScaleOpt)

  32. arXiv:2511.13735  [pdf, ps, other

    cs.NE eess.IV

    MS2Edge: Towards Energy-Efficient and Crisp Edge Detection with Multi-Scale Residual Learning in SNNs

    Authors: Yimeng Fan, Changsong Liu, Mingyang Li, Yuzhou Dai, Yanyan Liu, Wei Zhang

    Abstract: Edge detection with Artificial Neural Networks (ANNs) has achieved remarkable prog\-ress but faces two major challenges. First, it requires pre-training on large-scale extra data and complex designs for prior knowledge, leading to high energy consumption. Second, the predicted edges perform poorly in crispness and heavily rely on post-processing. Spiking Neural Networks (SNNs), as third generation… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  33. arXiv:2511.13211  [pdf, ps, other

    cs.CV

    3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

    Authors: Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Jian Wang, Keze Wang

    Abstract: Despite recent advancements in 3D-text cross-modal alignment, existing state-of-the-art methods still struggle to align fine-grained textual semantics with detailed geometric structures, and their alignment performance degrades significantly when scaling to large-scale 3D databases. To overcome this limitation, we introduce 3DAlign-DAER, a unified framework designed to align text and 3D geometry v… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  34. arXiv:2511.13193  [pdf, ps, other

    cs.AI

    Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

    Authors: Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Chengpei Tang, Jian Wang, Keze Wang

    Abstract: Multi-agent systems (MAS) built on large language models (LLMs) often suffer from inefficient "free-for-all" communication, leading to exponential token costs and low signal-to-noise ratios that hinder their practical deployment. We challenge the notion that more communication is always beneficial, hypothesizing instead that the core issue is the absence of resource rationality. We argue that "fre… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  35. arXiv:2511.12545  [pdf, ps, other

    cs.LG stat.ML

    Center-Outward q-Dominance: A Sample-Computable Proxy for Strong Stochastic Dominance in Multi-Objective Optimisation

    Authors: Robin van der Laag, Hao Wang, Thomas Bäck, Yingjie Fan

    Abstract: Stochastic multi-objective optimization (SMOOP) requires ranking multivariate distributions; yet, most empirical studies perform scalarization, which loses information and is unreliable. Based on the optimal transport theory, we introduce the center-outward q-dominance relation and prove it implies strong first-order stochastic dominance (FSD). Also, we develop an empirical test procedure based on… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: Extended version including appendix of a paper accepted at AAAI-26 main technical track (to appear)

  36. arXiv:2511.11025  [pdf, ps, other

    cs.CV cs.AI

    AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning

    Authors: Jirong Zha, Yuxuan Fan, Tianyu Zhang, Geng Chen, Yingfeng Chen, Chen Gao, Xinlei Chen

    Abstract: Multimodal Large Language Models (MLLMs) have shown promise in single-agent vision tasks, yet benchmarks for evaluating multi-agent collaborative perception remain scarce. This gap is critical, as multi-drone systems provide enhanced coverage, robustness, and collaboration compared to single-sensor setups. Existing multi-image benchmarks mainly target basic perception tasks using high-quality sing… ▽ More

    Submitted 22 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

  37. arXiv:2511.09272  [pdf, ps, other

    cs.CV

    GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow

    Authors: Rui Wan, Qi Zheng, Ruoyu Zhang, Bu Chen, Jiaming Liu, Min Li, Minge Jing, Jinjia Zhou, Yibo Fan

    Abstract: The Animation-based Generative Codec (AGC) is an emerging paradigm for talking-face video compression. However, deploying its intricate decoder on resource and power-constrained edge devices presents challenges due to numerous parameters, the inflexibility to adapt to dynamically evolving algorithms, and the high power consumption induced by extensive computations and data transmission. This paper… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  38. arXiv:2511.08939  [pdf, ps, other

    cs.LG cs.CL

    TransactionGPT

    Authors: Yingtong Dou, Zhimeng Jiang, Tianyi Zhang, Mingzhi Hu, Zhichao Xu, Shubham Jain, Uday Singh Saini, Xiran Fan, Jiarui Sun, Menghai Pan, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yujie Fan, Vineeth Rakesh, Huiyuan Chen, Mangesh Bendre, Zhongfang Zhuang, Xiaoting Li, Prince Aboagye, Vivian Lai, Minghua Xu, Hao Yang, Yiwei Cai , et al. (2 additional authors not shown)

    Abstract: We present TransactionGPT (TGPT), a foundation model for consumer transaction data within one of world's largest payment networks. TGPT is designed to understand and generate transaction trajectories while simultaneously supporting a variety of downstream prediction and classification tasks. We introduce a novel 3D-Transformer architecture specifically tailored for capturing the complex dynamics i… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: Technical Report

  39. Robustness study of the bio-inspired musculoskeletal arm robot based on the data-driven iterative learning algorithm

    Authors: Jianbo Yuan, Jing Dai, Yerui Fan, Yaxiong Wu, Yunpeng Liang, Weixin Yan

    Abstract: The human arm exhibits remarkable capabilities, including both explosive power and precision, which demonstrate dexterity, compliance, and robustness in unstructured environments. Developing robotic systems that emulate human-like operational characteristics through musculoskeletal structures has long been a research focus. In this study, we designed a novel lightweight tendon-driven musculoskelet… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: 20 pages, 13 figures

    Journal ref: SCIENCE CHINA Information Sciences 2025, 68(12): 222203

  40. arXiv:2511.04952  [pdf, ps, other

    cs.CL

    LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model

    Authors: Wei Shao, Lingchao Zheng, Pengyu Wang, Peizhen Zheng, Jun Li, Yuwei Fan

    Abstract: Long context inference scenarios have become increasingly important for large language models, yet they introduce significant computational latency. While prior research has optimized long-sequence inference through operators, model architectures, and system frameworks, tokenization remains an overlooked bottleneck. Existing parallel tokenization methods accelerate processing through text segmenta… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

  41. How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks

    Authors: Ruksit Rojpaisarnkit, Youmei Fan, Kenichi Matsumoto, Raula Gaikovina Kula

    Abstract: With the widespread adoption of Foundation Model (FM)-powered tools in software engineering, the natural language prompt has become a critical interface between developers and Large Language Models (LLMs). While much research has focused on prompt structure, the natural language proficiency is an underexplored factor that can influence the quality of generated code. This paper investigates whether… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

    Comments: 7 pages, 4 tables, 1 figure

  42. arXiv:2511.03378  [pdf, ps, other

    cs.SI cs.CL

    Beyond Citations: Measuring Idea-level Knowledge Diffusion from Research to Journalism and Policy-making

    Authors: Yangliu Fan, Kilian Buehling, Volker Stocker

    Abstract: Despite the importance of social science knowledge for various stakeholders, measuring its diffusion into different domains remains a challenge. This study uses a novel text-based approach to measure the idea-level diffusion of social science knowledge from the research domain to the journalism and policy-making domains. By doing so, we expand the detection of knowledge diffusion beyond the measur… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  43. arXiv:2511.02200  [pdf, ps, other

    cs.AI

    Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration

    Authors: Jingbo Wang, Sendong Zhao, Haochun Wang, Yuzheng Fan, Lizhe Zhang, Yan Liu, Ting Liu

    Abstract: The emergence of multi-agent systems powered by large language models (LLMs) has unlocked new frontiers in complex task-solving, enabling diverse agents to integrate unique expertise, collaborate flexibly, and address challenges unattainable for individual models. However, the full potential of such systems is hindered by rigid agent scheduling and inefficient coordination strategies that fail to… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  44. arXiv:2511.01409  [pdf, ps, other

    cs.CL

    LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

    Authors: Heng Zhou, Ao Yu, Yuchen Fan, Jianing Shi, Li Kang, Hejia Geng, Yongting Zhang, Yutao Fan, Yuhao Wu, Tiancheng He, Yiran Qin, Lei Bai, Zhenfei Yin

    Abstract: Evaluating large language models (LLMs) on question answering often relies on static benchmarks that reward memorization and understate the role of retrieval, failing to capture the dynamic nature of world knowledge. We present LiveSearchBench, an automated pipeline for constructing retrieval-dependent benchmarks from recent knowledge updates. Our method computes deltas between successive Wikidata… ▽ More

    Submitted 6 November, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

  45. arXiv:2510.26692  [pdf, ps, other

    cs.CL cs.LG

    Kimi Linear: An Expressive, Efficient Attention Architecture

    Authors: Kimi Team, Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, T. Y. Liu, Haiming Wang, Shengjun Fang , et al. (35 additional authors not shown)

    Abstract: We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mech… ▽ More

    Submitted 1 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: Kimi Linear tech report

  46. arXiv:2510.26464  [pdf, ps, other

    cs.CV

    Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection

    Authors: Yuanting Fan, Jun Liu, Xiaochen Chen, Bin-Bin Gao, Jian Li, Yong Liu, Jinlong Peng, Chengjie Wang

    Abstract: Few-shot anomaly detection (FSAD) methods identify anomalous regions with few known normal samples. Most existing methods rely on the generalization ability of pre-trained vision-language models (VLMs) to recognize potentially anomalous regions through feature similarity between text descriptions and images. However, due to the lack of detailed textual descriptions, these methods can only pre-defi… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

    Comments: 12 pages, 7 figures

  47. arXiv:2510.24668  [pdf, ps, other

    cs.CL cs.AI

    InteractComp: Evaluating Search Agents With Ambiguous Queries

    Authors: Mingyi Deng, Lijun Huang, Yani Fan, Jiayi Zhang, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, Yanfei Zhang, Fengwei Teng, Yingjia Wan, Song Hu, Yude Li, Xin Jin, Conghao Hu, Haoyu Li, Qirui Fu, Tai Zhong, Xinyu Wang, Xiangru Tang, Nan Tang, Chenglin Wu, Yuyu Luo

    Abstract: Language agents have demonstrated remarkable potential in web search and information retrieval. However, these search agents assume user queries are complete and unambiguous, an assumption that diverges from reality where users begin with incomplete queries requiring clarification through interaction. Yet most agents lack interactive mechanisms during the search process, and existing benchmarks ca… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  48. arXiv:2510.23668  [pdf, ps, other

    cs.LG cs.AI

    Traffic flow forecasting, STL decomposition, Hybrid model, LSTM, ARIMA, XGBoost, Intelligent transportation systems

    Authors: Fujiang Yuan, Yangrui Fan, Xiaohuan Bing, Zhen Tian, Chunhong Yuan, Yankang Li

    Abstract: Accurate traffic flow forecasting is essential for intelligent transportation systems and urban traffic management. However, single model approaches often fail to capture the complex, nonlinear, and multi scale temporal patterns in traffic flow data. This study proposes a decomposition driven hybrid framework that integrates Seasonal Trend decomposition using Loess (STL) with three complementary p… ▽ More

    Submitted 26 October, 2025; originally announced October 2025.

  49. arXiv:2510.22710  [pdf, ps, other

    cs.AI

    RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability

    Authors: Kaitong Cai, Jusheng Zhang, Yijia Fan, Jing Yang, Keze Wang

    Abstract: Retrieval-Augmented Generation (RAG) faces a core bottleneck with knowledge-sparse and semantically ambiguous long-tail queries, where retrieval noise distorts reasoning and necessitates costly post-processing. To tackle this, we propose RaCoT (Retrieval-aware Contrastive-of-Thought), a novel framework that shifts contrastive thinking to the pre-retrieval stage. By automatically generating a seman… ▽ More

    Submitted 26 October, 2025; originally announced October 2025.

  50. arXiv:2510.22477  [pdf, ps, other

    cs.MA cs.AI

    Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

    Authors: Yijia Fan, Jusheng Zhang, Jing Yang, Keze Wang

    Abstract: To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penali… ▽ More

    Submitted 25 October, 2025; originally announced October 2025.