Skip to main content

Showing 1–50 of 455 results for author: Xia, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.18735  [pdf, ps, other

    cs.CV cs.AI

    $M^3-Verse$: A "Spot the Difference" Challenge for Large Multimodal Models

    Authors: Kewei Wei, Bocheng Hu, Jie Cao, Xiaohan Chen, Zhengxi Lu, Wubing Xia, Weili Xu, Jiaao Wu, Junchen He, Mingyu Jia, Ciyun Zhao, Ye Sun, Yizhi Li, Zhonghan Zhao, Jian Zhang, Gaoang Wang

    Abstract: Modern Large Multimodal Models (LMMs) have demonstrated extraordinary ability in static image and single-state spatial-temporal understanding. However, their capacity to comprehend the dynamic changes of objects within a shared spatial context between two distinct video observations, remains largely unexplored. This ability to reason about transformations within a consistent environment is particu… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

  2. arXiv:2512.15020  [pdf, ps, other

    cs.RO

    ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision

    Authors: Wenlong Xia, Jinhao Zhang, Ce Zhang, Yaojia Wang, Youmin Gong, Jie Mei

    Abstract: Vision-based imitation learning has enabled impressive robotic manipulation skills, but its reliance on object appearance while ignoring the underlying 3D scene structure leads to low training efficiency and poor generalization. To address these challenges, we introduce \emph{Implicit Scene Supervision (ISS) Policy}, a 3D visuomotor DiT-based diffusion policy that predicts sequences of continuous… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  3. arXiv:2512.14691  [pdf, ps, other

    cs.CL cs.CV

    MMGR: Multi-Modal Generative Reasoning

    Authors: Zefan Cai, Haoyi Qiu, Tianyi Ma, Haozhe Zhao, Gengze Zhou, Kung-Hsiang Huang, Parisa Kordjamshidi, Minjia Zhang, Wen Xiao, Jiuxiang Gu, Nanyun Peng, Junjie Hu

    Abstract: Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture physical, logical, and spatial constraints. Existing metrics such as Frechet Video Distance (FVD) emphasize perceptual quality and overlook reasoning failures, including violations of causality, physics, and global consistency. We introduce… ▽ More

    Submitted 17 December, 2025; v1 submitted 16 December, 2025; originally announced December 2025.

    Comments: work in progress

  4. arXiv:2512.11999  [pdf, ps, other

    eess.SY cs.RO

    Taylor-Lagrange Control for Safety-Critical Systems

    Authors: Wei Xiao, Anni Li

    Abstract: This paper proposes a novel Taylor-Lagrange Control (TLC) method for nonlinear control systems to ensure the safety and stability through Taylor's theorem with Lagrange remainder. To achieve this, we expand a safety or stability function with respect to time along the system dynamics using the Lie derivative and Taylor's theorem. This expansion enables the control input to appear in the Taylor ser… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: 13 pages

  5. arXiv:2512.06973  [pdf, ps, other

    eess.SY cs.LG

    Learning Robust and Correct Controllers Guided by Feasibility-Aware Signal Temporal Logic via BarrierNet

    Authors: Shuo Liu, Wenliang Liu, Wei Xiao, Calin A. Belta

    Abstract: Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled the specification-driven synthesis of complex robotic behaviors. However, existing CBF-STL approaches typically rely on fixed hyperparameters and myopic, per-time step optimization, which can lead to overly conse… ▽ More

    Submitted 16 December, 2025; v1 submitted 7 December, 2025; originally announced December 2025.

    Comments: 16 pages, 11 figures

  6. arXiv:2512.04515  [pdf, ps, other

    cs.CV

    EgoLCD: Egocentric Video Generation with Long Context Diffusion

    Authors: Liuzhou Zhang, Jiarui Ye, Yuanlei Wang, Ming Zhong, Mingju Cao, Wanke Xia, Bowen Zeng, Zeyu Zhang, Hao Tang

    Abstract: Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  7. arXiv:2512.02918  [pdf, ps, other

    cs.CR cs.PL cs.SE

    Belobog: Move Language Fuzzing Framework For Real-World Smart Contracts

    Authors: Wanxu Xia, Ziqiao Kong, Zhengwei Li, Yi Lu, Pan Li, Liqun Yang, Yang Liu, Xiapu Luo, Shaohua Li

    Abstract: Move is a research-oriented programming language design for secure and verifiable smart contract development and has been widely used in managing billions of digital assets in blockchains, such as Sui and Aptos. Move features a strong static type system and explicit resource semantics to enforce safety properties such as the prevention of data races, invalid asset transfers, and entry vulnerabilit… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    Comments: Slight revision and under review

  8. arXiv:2512.02720  [pdf, ps, other

    cs.AI cs.LG

    StockMem: An Event-Reflection Memory Framework for Stock Forecasting

    Authors: He Wang, Wenyilin Xiao, Songqiao Han, Hailiang Huang

    Abstract: Stock price prediction is challenging due to market volatility and its sensitivity to real-time events. While large language models (LLMs) offer new avenues for text-based forecasting, their application in finance is hindered by noisy news data and the lack of explicit answers in text. General-purpose memory architectures struggle to identify the key drivers of price movements. To address this, we… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  9. arXiv:2512.02556  [pdf, ps, other

    cs.CL

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Authors: DeepSeek-AI, Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenhao Xu, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Erhang Li, Fangqi Zhou, Fangyun Lin, Fucong Dai, Guangbo Hao , et al. (239 additional authors not shown)

    Abstract: We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2)… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  10. arXiv:2512.01061  [pdf, ps, other

    cs.RO cs.CV

    Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

    Authors: Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu

    Abstract: Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: https://doorman-humanoid.github.io/

  11. arXiv:2512.00591  [pdf, ps, other

    cs.CR

    TrojanLoC: LLM-based Framework for RTL Trojan Localization

    Authors: Weihua Xiao, Zeng Wang, Minghao Shao, Raghu Vamshi Hemadri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel, Siddharth Garg, Ramesh Karri

    Abstract: Hardware Trojans (HT s) are a persistent threat to integrated circuits, especially when inserted at the register-transfer level (RTL). Existing methods typically first convert the design into a graph, such as a gate-level netlist or an RTL-derived dataflow graph (DFG), and then use a graph neural network (GNN ) to obtain an embedding of that graph, which (i) loses compact RTL semantics, (ii) relie… ▽ More

    Submitted 29 November, 2025; originally announced December 2025.

  12. arXiv:2512.00470  [pdf, ps, other

    cs.RO

    LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

    Authors: Jinhao Zhang, Wenlong Xia, Zhexuan Zhou, Youmin Gong, Jie Mei

    Abstract: Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP… ▽ More

    Submitted 2 December, 2025; v1 submitted 29 November, 2025; originally announced December 2025.

  13. arXiv:2511.22749  [pdf, ps, other

    cs.LG cs.AI

    VeriDispatcher: Multi-Model Dispatching through Pre-Inference Difficulty Prediction for RTL Generation Optimization

    Authors: Zeng Wang, Weihua Xiao, Minghao Shao, Raghu Vamshi Hemadri, Ozgur Sinanoglu, Muhammad Shafique, Ramesh Karri

    Abstract: Large Language Models (LLMs) show strong performance in RTL generation, but different models excel on different tasks because of architecture and training differences. Prior work mainly prompts or finetunes a single model. What remains not well studied is how to coordinate multiple different LLMs so they jointly improve RTL quality while also reducing cost, instead of running all models and choosi… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

  14. arXiv:2511.21016  [pdf, ps, other

    cs.LG cs.CL

    Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

    Authors: Liangzu Peng, Aditya Chattopadhyay, Luca Zancato, Elvis Nunez, Wei Xia, Stefano Soatto

    Abstract: As efficient alternatives to softmax Attention, linear State-Space Models (SSMs) achieve constant memory and linear compute, but maintain only a lossy, fading summary of the past, often leading to inferior performance in recall-oriented tasks. We propose Gated KalmaNet (GKA), a layer that accounts for the full past while maintaining SSM-style efficiency. We ground our approach in the Kalman Filter… ▽ More

    Submitted 18 December, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

    Comments: 30 pages, 10 figures

  15. arXiv:2511.18600  [pdf, ps, other

    cs.CV

    NeAR: Coupled Neural Asset-Renderer Stack

    Authors: Hong Li, Chongjie Ye, Houyuan Chen, Weiqing Xiao, Ziyang Yan, Lixing Xiao, Zhaoxi Chen, Jianfeng Xiang, Shaocong Xu, Xuhui Liu, Yikai Wang, Baochang Zhang, Xiaoguang Han, Jiaolong Yang, Hao Zhao

    Abstract: Neural asset authoring and neural rendering have traditionally evolved as disjoint paradigms: one generates digital assets for fixed graphics pipelines, while the other maps conventional assets to images. However, treating them as independent entities limits the potential for end-to-end optimization in fidelity and consistency. In this paper, we bridge this gap with NeAR, a Coupled Neural Asset--R… ▽ More

    Submitted 18 December, 2025; v1 submitted 23 November, 2025; originally announced November 2025.

    Comments: 20 pages, 19 figures. The project page: https://near-project.github.io/

  16. arXiv:2511.16324  [pdf, ps, other

    cs.CL cs.AI

    SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning

    Authors: Wei Xia, Zhi-Hong Deng

    Abstract: With the rapid advancement of large language models (LLMs), their deployment in real-world applications has become increasingly widespread. LLMs are expected to deliver robust performance across diverse tasks, user preferences, and practical scenarios. However, as demands grow, ensuring that LLMs produce responses aligned with human intent remains a foundational challenge. In particular, aligning… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

  17. arXiv:2511.16193  [pdf, ps, other

    cs.DC cs.AI

    Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

    Authors: Rongxin Cheng, Kai Zhou, Xingda Wei, Siyuan Liu, Mingcong Han, Mingjing Ai, Yeju Zhou, Baoquan Zhong, Wencong Xiao, Rong Chen, Haibo Chen

    Abstract: Rollout dominates the training time in large language model (LLM) post-training, where the trained model is used to generate tokens given a batch of prompts. This work, SpecActor, achieves fast rollout with speculative decoding that deploys a fast draft path to accelerate the unparallelizable generation, while the correctness is guaranteed by fast parallel verification of the outputs with the orig… ▽ More

    Submitted 23 December, 2025; v1 submitted 20 November, 2025; originally announced November 2025.

  18. arXiv:2511.15200  [pdf, ps, other

    cs.RO

    VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation

    Authors: Tairan He, Zi Wang, Haoru Xue, Qingwei Ben, Zhengyi Luo, Wenli Xiao, Ye Yuan, Xingye Da, Fernando Castañeda, Shankar Sastry, Changliu Liu, Guanya Shi, Linxi Fan, Yuke Zhu

    Abstract: A key barrier to the real-world deployment of humanoid robots is the lack of autonomous loco-manipulation skills. We introduce VIRAL, a visual sim-to-real framework that learns humanoid loco-manipulation entirely in simulation and deploys it zero-shot to real hardware. VIRAL follows a teacher-student design: a privileged RL teacher, operating on full state, learns long-horizon loco-manipulation us… ▽ More

    Submitted 27 November, 2025; v1 submitted 19 November, 2025; originally announced November 2025.

    Comments: Project website: https://viral-humanoid.github.io/

  19. arXiv:2511.11519  [pdf, ps, other

    cs.AI cs.LG

    Experience-Guided Adaptation of Inference-Time Reasoning Strategies

    Authors: Adam Stein, Matthew Trager, Benjamin Bowman, Michael Kleinman, Aditya Chattopadhyay, Wei Xia, Stefano Soatto

    Abstract: Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify s… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: 29 pages, 5 figures

  20. arXiv:2511.07820  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.GR eess.SY

    SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

    Authors: Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Castañeda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Zi Wang, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi "Jim" Fan , et al. (1 additional authors not shown)

    Abstract: Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid cont… ▽ More

    Submitted 4 December, 2025; v1 submitted 10 November, 2025; originally announced November 2025.

    Comments: Project page: https://nvlabs.github.io/SONIC/

  21. arXiv:2511.07412  [pdf, ps, other

    cs.CV cs.RO

    TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

    Authors: Han Zhang, Yiqing Shen, Roger D. Soberanis-Mukul, Ankita Ghosh, Hao Ding, Lalithkumar Seenivasan, Jose L. Porras, Zhekai Mao, Chenjia Li, Wenjie Xiao, Lonny Yarmus, Angela Christine Argento, Masaru Ishii, Mathias Unberath

    Abstract: Developing embodied AI for intelligent surgical systems requires safe, controllable environments for continual learning and evaluation. However, safety regulations and operational constraints in operating rooms (ORs) limit embodied agents from freely perceiving and interacting in realistic settings. Digital twins provide high-fidelity, risk-free environments for exploration and training. How we ma… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  22. arXiv:2511.06818  [pdf, ps, other

    cs.CL cs.LG

    Learning to Focus: Focal Attention for Selective and Scalable Transformers

    Authors: Dhananjay Ram, Wei Xia, Stefano Soatto

    Abstract: Attention is a core component of transformer architecture, whether encoder-only, decoder-only, or encoder-decoder model. However, the standard softmax attention often produces noisy probability distribution, which can impair effective feature selection at every layer of these models, particularly for long contexts. We propose Focal Attention, a simple yet effective modification that sharpens the a… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  23. DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms

    Authors: Shengyu Tang, Zeyuan Lu, Jiazhi Dong, Changdong Yu, Xiaoyu Wang, Yaohui Lyu, Weihao Xia

    Abstract: Accurate perception of the marine environment through robust multi-object tracking (MOT) is essential for ensuring safe vessel navigation and effective maritime surveillance. However, the complicated maritime environment often causes camera motion and subsequent visual degradation, posing significant challenges to MOT. To address this challenge, we propose an efficient Dual-branch Maritime SORT (D… ▽ More

    Submitted 15 November, 2025; v1 submitted 6 November, 2025; originally announced November 2025.

    Comments: This version clarifies several citation formatting inconsistencies caused by a technical issue in the reference management software used during manuscript preparation. All scientific data, experiments, and conclusions remain fully valid and unaffected. The clarification is provided to maintain transparency and consistency in the scholarly record

  24. arXiv:2511.02212  [pdf

    physics.med-ph cs.CV eess.IV

    High-Resolution Magnetic Particle Imaging System Matrix Recovery Using a Vision Transformer with Residual Feature Network

    Authors: Abuobaida M. Khair, Wenjing Jiang, Yousuf Babiker M. Osman, Wenjun Xia, Xiaopeng Ma

    Abstract: This study presents a hybrid deep learning framework, the Vision Transformer with Residual Feature Network (VRF-Net), for recovering high-resolution system matrices in Magnetic Particle Imaging (MPI). MPI resolution often suffers from downsampling and coil sensitivity variations. VRF-Net addresses these challenges by combining transformer-based global attention with residual convolutional refineme… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

    Journal ref: Biomedical Signal Processing and Control 113 (2026) 108990

  25. arXiv:2511.02130  [pdf, ps, other

    cs.AI cs.LG

    Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning

    Authors: Renos Zabounidis, Aditya Golatkar, Michael Kleinman, Alessandro Achille, Wei Xia, Stefano Soatto

    Abstract: We propose Re-FORC, an adaptive reward prediction method that, given a context, enables prediction of the expected future rewards as a function of the number of future thinking tokens. Re-FORC trains a lightweight adapter on reasoning models, demonstrating improved prediction with longer reasoning and larger models. Re-FORC enables: 1) early stopping of unpromising reasoning chains, reducing compu… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

    Comments: Accepted at Efficient Reasoning Workshop at NeurIPS 2025

  26. arXiv:2511.00091  [pdf, ps, other

    cs.CV cs.RO

    Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

    Authors: Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Guanya Shi, Yuke Zhu

    Abstract: Supervised fine-tuning (SFT) has become the de facto post-training strategy for large vision-language-action (VLA) models, but its reliance on costly human demonstrations limits scalability and generalization. We propose Probe, Learn, Distill (PLD), a three-stage plug-and-play framework that improves VLAs through residual reinforcement learning (RL) and distribution-aware data collection. In Stage… ▽ More

    Submitted 30 October, 2025; originally announced November 2025.

    Comments: 26 pages

  27. arXiv:2510.27042  [pdf, ps, other

    cs.AI cs.LG

    e1: Learning Adaptive Control of Reasoning Effort

    Authors: Michael Kleinman, Matthew Trager, Alessandro Achille, Wei Xia, Stefano Soatto

    Abstract: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost. To leverage this tradeoff effectively, users need fine-grained control over the amount of thinking used for a particular quer… ▽ More

    Submitted 11 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  28. arXiv:2510.23650  [pdf, ps, other

    cs.LG cs.AI

    Beyond Hidden-Layer Manipulation: Semantically-Aware Logit Interventions for Debiasing LLMs

    Authors: Wei Xia

    Abstract: We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer approaches. We show semantic-aware logits intervention is stable and effective for debiasing aligned LLMs.

    Submitted 25 October, 2025; originally announced October 2025.

  29. arXiv:2510.21857  [pdf, ps, other

    cs.CV cs.AI

    Poisson Flow Consistency Training

    Authors: Anthony Zhang, Mahmut Gokmen, Dennis Hein, Rongjun Ge, Wenjun Xia, Ge Wang, Jin Chen

    Abstract: The Poisson Flow Consistency Model (PFCM) is a consistency-style model based on the robust Poisson Flow Generative Model++ (PFGM++) which has achieved success in unconditional image generation and CT image denoising. Yet the PFCM can only be trained in distillation which limits the potential of the PFCM in many data modalities. The objective of this research was to create a method to train the PFC… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: 5 pages, 3 figures, 1 table

    MSC Class: 68T07 (Primary); 68T45 (Secondary)

  30. arXiv:2510.21830  [pdf, ps, other

    cs.LG cs.AI

    GAPO: Robust Advantage Estimation for Real-World Code LLMs

    Authors: Jianqing Zhang, Zhezheng Hao, Wei Xia, Hande Dong, Hong Wang, Chenxing Wei, Yuyan Zhou, Yubin Qi, Qiang Lin, Jian Cao

    Abstract: Reinforcement learning (RL) is widely used for post-training large language models (LLMs) in code editing, where group-relative methods like GRPO are popular for their critic-free, normalized advantage estimation. However, in real-world code-editing scenarios, reward distributions are often skewed with unpredictable outliers, leading to distorted advantage computation and increased noise. To addre… ▽ More

    Submitted 2 December, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

  31. arXiv:2510.19470  [pdf, ps, other

    cs.DC cs.AI cs.LG

    HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission

    Authors: Weihao Yang, Hao Huang, Donglei Wu, Ningke Li, Yanqi Pan, Qiyang Zheng, Wen Xia, Shiyi Li, Qiang Wang

    Abstract: Mixture-of-Experts (MoE) has become a popular architecture for scaling large models. However, the rapidly growing scale outpaces model training on a single DC, driving a shift toward a more flexible, cross-DC training paradigm. Under this, Expert Parallelism (EP) of MoE faces significant scalability issues due to the limited cross-DC bandwidth. Specifically, existing EP optimizations attempt to ov… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

  32. arXiv:2510.17247  [pdf, ps, other

    cs.CL cs.CV

    From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

    Authors: Zefan Cai, Haoyi Qiu, Haozhe Zhao, Ke Wan, Jiachen Li, Jiuxiang Gu, Wen Xiao, Nanyun Peng, Junjie Hu

    Abstract: Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a c… ▽ More

    Submitted 20 October, 2025; originally announced October 2025.

  33. arXiv:2510.12842  [pdf, ps, other

    q-bio.QM cs.LG

    Protenix-Mini+: efficient structure prediction model with scalable pairformer

    Authors: Bo Qiang, Chengyue Gong, Xinshi Chen, Yuxuan Zhang, Wenzhi Xiao

    Abstract: Lightweight inference is critical for biomolecular structure prediction and downstream tasks, enabling efficient real-world deployment and inference-time scaling for large-scale applications. While AF3 and its variants (e.g., Protenix, Chai-1) have advanced structure prediction results, they suffer from critical limitations: high inference latency and cubic time complexity with respect to token co… ▽ More

    Submitted 15 October, 2025; v1 submitted 13 October, 2025; originally announced October 2025.

  34. arXiv:2510.12157  [pdf, ps, other

    cs.LG

    Self-Verifying Reflection Helps Transformers with CoT Reasoning

    Authors: Zhongwei Yu, Wannian Xia, Xue Yan, Bo Xu, Haifeng Zhang, Yali Du, Jun Wang

    Abstract: Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning fr… ▽ More

    Submitted 14 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS2025

  35. arXiv:2510.08263  [pdf, ps, other

    cs.AI

    Co-TAP: Three-Layer Agent Interaction Protocol Technical Report

    Authors: Shunyu An, Miao Wang, Yongchao Li, Dong Wan, Lina Wang, Ling Qin, Liqin Gao, Congyao Fan, Zhiyong Mao, Jiange Pu, Wenji Xia, Dong Zhao, Zhaohui Hao, Rui Hu, Ji Lu, Guiyue Zhou, Baoyu Tang, Yanqin Gao, Yongsheng Du, Daigang Xu, Lingjun Huang, Baoli Wang, Xiwen Zhang, Luyao Wang, Shilong Liu

    Abstract: This paper proposes Co-TAP (T: Triple, A: Agent, P: Protocol), a three-layer agent interaction protocol designed to address the challenges faced by multi-agent systems across the three core dimensions of Interoperability, Interaction and Collaboration, and Knowledge Sharing. We have designed and proposed a layered solution composed of three core protocols: the Human-Agent Interaction Protocol (HAI… ▽ More

    Submitted 28 October, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

  36. arXiv:2510.05069  [pdf, ps, other

    cs.CL cs.AI

    SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

    Authors: Dachuan Shi, Abedelkadir Asi, Keying Li, Xiangchi Yuan, Leyan Pan, Wenke Lee, Wen Xiao

    Abstract: Recent work shows that, beyond discrete reasoning through explicit chain-of-thought steps, which are limited by the boundaries of natural languages, large language models (LLMs) can also reason continuously in latent space, allowing richer information per step and thereby improving token efficiency. Despite this promise, latent reasoning still faces two challenges, especially in training-free sett… ▽ More

    Submitted 6 December, 2025; v1 submitted 6 October, 2025; originally announced October 2025.

    Comments: Code: https://github.com/sdc17/SwiReasoning, Website: https://swireasoning.github.io/

  37. arXiv:2510.03950  [pdf, ps, other

    cs.LG

    What Is The Performance Ceiling of My Classifier? Utilizing Category-Wise Influence Functions for Pareto Frontier Analysis

    Authors: Shahriar Kabir Nahin, Wenxiao Xiao, Joshua Liu, Anshuman Chhabra, Hongfu Liu

    Abstract: Data-centric learning seeks to improve model performance from the perspective of data quality, and has been drawing increasing attention in the machine learning community. Among its key tools, influence functions provide a powerful framework to quantify the impact of individual training samples on model predictions, enabling practitioners to identify detrimental samples and retrain models on a cle… ▽ More

    Submitted 4 October, 2025; originally announced October 2025.

  38. arXiv:2510.02393  [pdf, ps, other

    cs.SE

    AP2O-Coder: Human-Inspired Progressive Optimization to Fix LLM Code Errors

    Authors: Jianqing Zhang, Wei Xia, Hande Dong, Qiang Lin, Jian Cao

    Abstract: LLMs' code generation capabilities have yielded substantial improvements in the effectiveness of programming tasks. However, LLM-generated code still suffers from compilation and runtime errors. Existing offline preference optimization methods primarily focus on enhancing LLMs' coding abilities using pass/fail signals in the preference data, overlooking the deep-level error types in the failed cod… ▽ More

    Submitted 23 November, 2025; v1 submitted 30 September, 2025; originally announced October 2025.

    Comments: Accepted by AAAI2026

  39. arXiv:2510.01357  [pdf, ps, other

    cs.RO

    Safe Motion Planning and Control Using Predictive and Adaptive Barrier Methods for Autonomous Surface Vessels

    Authors: Alejandro Gonzalez-Garcia, Wei Xiao, Wei Wang, Alejandro Astudillo, Wilm Decré, Jan Swevers, Carlo Ratti, Daniela Rus

    Abstract: Safe motion planning is essential for autonomous vessel operations, especially in challenging spaces such as narrow inland waterways. However, conventional motion planning approaches are often computationally intensive or overly conservative. This paper proposes a safe motion planning strategy combining Model Predictive Control (MPC) and Control Barrier Functions (CBFs). We introduce a time-varyin… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

    Comments: IROS 2025

  40. arXiv:2509.24804  [pdf, ps, other

    cs.LG

    DyMoDreamer: World Modeling with Dynamic Modulation

    Authors: Boxuan Zhang, Runqing Wang, Wei Xiao, Weipu Zhang, Jian Sun, Gao Huang, Jie Chen, Gang Wang

    Abstract: A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions. Model-based reinforcement learning (MBRL) mitigates this by building world models that simulate environmental dynamics and generate synthetic experience, improving sample efficiency. However, conventional world models process obs… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  41. arXiv:2509.20368  [pdf, ps, other

    cs.AI

    LATTS: Locally Adaptive Test-Time Scaling

    Authors: Theo Uscidda, Matthew Trager, Michael Kleinman, Aditya Chattopadhyay, Wei Xia, Stefano Soatto

    Abstract: One common strategy for improving the performance of Large Language Models (LLMs) on downstream tasks involves using a \emph{verifier model} to either select the best answer from a pool of candidates or to steer the auto-regressive generation process towards better outputs. This class of methods typically results in improved accuracy at the cost of increased computation at test-time, a paradigm kn… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  42. arXiv:2509.17040  [pdf, ps, other

    cs.CV cs.AI

    From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning

    Authors: Hang Du, Jiayang Zhang, Guoshun Nan, Wendi Deng, Zhenyan Chen, Chenyang Zhang, Wang Xiao, Shan Huang, Yuqi Pan, Tao Qi, Sicong Leng

    Abstract: Multi-image Interleaved Reasoning aims to improve Multi-modal Large Language Models (MLLMs) ability to jointly comprehend and reason across multiple images and their associated textual contexts, introducing unique challenges beyond single-image or non-interleaved multi-image tasks. While current multi-image benchmarks overlook interleaved textual contexts and neglect distinct relationships between… ▽ More

    Submitted 15 October, 2025; v1 submitted 21 September, 2025; originally announced September 2025.

    Comments: Accepted by ICCV 2025

  43. arXiv:2509.16293  [pdf, ps, other

    cs.LG cs.AI cs.DC

    Robust LLM Training Infrastructure at ByteDance

    Authors: Borui Wan, Gaohong Liu, Zuquan Song, Jun Wang, Yun Zhang, Guangming Sheng, Shuguang Wang, Houmin Wei, Chenyuan Wang, Weiqiang Lou, Xi Yang, Mofan Zhang, Kaihua Jiang, Cheng Ren, Xiaoyun Zhi, Menghan Yu, Zhe Nan, Zhuolin Zheng, Baoquan Zhong, Qinlong Wang, Huan Yu, Jinxin Chi, Wang Zhang, Yuhan Li, Zixian Du , et al. (10 additional authors not shown)

    Abstract: The training scale of large language models (LLMs) has reached tens of thousands of GPUs and is still continuously expanding, enabling faster learning of larger models. Accompanying the expansion of the resource scale is the prevalence of failures (CUDA error, NaN values, job hang, etc.), which poses significant challenges to training stability. Any large-scale LLM training infrastructure should s… ▽ More

    Submitted 20 October, 2025; v1 submitted 19 September, 2025; originally announced September 2025.

  44. arXiv:2509.15940  [pdf, ps, other

    cs.DC

    Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs

    Authors: Guoliang He, Youhe Jiang, Wencong Xiao, Kaihua Jiang, Shuguang Wang, Jun Wang, Zixian Du, Zhuo Jiang, Xinlei Zhang, Binhang Yuan, Eiko Yoneki

    Abstract: The scaling law for large language models (LLMs) depicts that the path towards machine intelligence necessitates training at large scale. Thus, companies continuously build large-scale GPU clusters, and launch training jobs that span over thousands of computing nodes. However, LLM pre-training presents unique challenges due to its complex communication patterns, where GPUs exchange data in sparse… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: NeurIPS 2025

  45. arXiv:2509.15473  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech

    Authors: Yuyu Wang, Wuyue Xia, Huaxiu Yao, Jingping Nie

    Abstract: Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, b… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: 6 pages, 3rd ACM International Workshop on Intelligent Acoustic Systems and Applications (IASA 25)

  46. arXiv:2509.12776  [pdf, ps, other

    cs.RO

    Integrating Trajectory Optimization and Reinforcement Learning for Quadrupedal Jumping with Terrain-Adaptive Landing

    Authors: Renjie Wang, Shangke Lyu, Xin Lang, Wei Xiao, Donglin Wang

    Abstract: Jumping constitutes an essential component of quadruped robots' locomotion capabilities, which includes dynamic take-off and adaptive landing. Existing quadrupedal jumping studies mainly focused on the stance and flight phase by assuming a flat landing ground, which is impractical in many real world cases. This work proposes a safe landing framework that achieves adaptive landing on rough terrains… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: Accepted by IROS 2025

  47. arXiv:2509.12562  [pdf, ps, other

    cs.RO

    Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling

    Authors: Zhefei Gong, Shangke Lyu, Pengxiang Ding, Wei Xiao, Donglin Wang

    Abstract: Imitation learning (IL) enables efficient skill acquisition from demonstrations but often struggles with long-horizon tasks and high-precision control due to compounding errors. Residual policy learning offers a promising, model-agnostic solution by refining a base policy through closed-loop corrections. However, existing approaches primarily focus on local corrections to the base policy, lacking… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

  48. arXiv:2509.11839  [pdf, ps, other

    cs.RO cs.CV

    TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning

    Authors: Jiacheng Liu, Pengxiang Ding, Qihang Zhou, Yuxuan Wu, Da Huang, Zimian Peng, Wei Xiao, Weinan Zhang, Lixin Yang, Cewu Lu, Donglin Wang

    Abstract: Recent Vision-Language-Action models show potential to generalize across embodiments but struggle to quickly align with a new robot's action space when high-quality demonstrations are scarce, especially for bipedal humanoids. We present TrajBooster, a cross-embodiment framework that leverages abundant wheeled-humanoid data to boost bipedal VLA. Our key idea is to use end-effector trajectories as a… ▽ More

    Submitted 16 September, 2025; v1 submitted 15 September, 2025; originally announced September 2025.

  49. arXiv:2509.03018  [pdf

    cs.DC cs.LG

    Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training

    Authors: Yangtao Deng, Lei Zhang, Qinlong Wang, Xiaoyun Zhi, Xinlei Zhang, Zhuo Jiang, Haohan Xu, Lei Wang, Zuquan Song, Gaohong Liu, Yang Bai, Shuguang Wang, Wencong Xiao, Jianxi Ye, Minlan Yu, Hong Xu

    Abstract: Reliability is essential for ensuring efficiency in LLM training. However, many real-world reliability issues remain difficult to resolve, resulting in wasted resources and degraded model performance. Unfortunately, today's collective communication libraries operate as black boxes, hiding critical information needed for effective root cause analysis. We propose Mycroft, a lightweight distributed t… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  50. A-MHA*: Anytime Multi-Heuristic A*

    Authors: Ramkumar Natarajan, Muhammad Suhail Saleem, William Xiao, Sandip Aine, Howie Choset, Maxim Likhachev

    Abstract: Designing good heuristic functions for graph search requires adequate domain knowledge. It is often easy to design heuristics that perform well and correlate with the underlying true cost-to-go values in certain parts of the search space but these may not be admissible throughout the domain thereby affecting the optimality guarantees of the search. Bounded suboptimal search using several such part… ▽ More

    Submitted 29 August, 2025; originally announced August 2025.