Skip to main content

Showing 1–50 of 121 results for author: Ni, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.09330  [pdf, ps, other

    cs.RO cs.CV

    VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

    Authors: Xiaolei Lang, Yang Wang, Yukun Zhou, Chaojun Ni, Kerui Li, Jiagang Zhu, Tianze Liu, Jiajun Lv, Xingxing Zuo, Yun Ye, Guan Huang, Xiaofeng Wang, Zheng Zhu

    Abstract: Recent advances in robot foundation models trained on large-scale human teleoperation data have enabled robots to perform increasingly complex real-world tasks. However, scaling these systems remains difficult because collecting task-specific demonstrations is expensive and labor-intensive. Synthetic data, especially generated videos, offer a promising direction, but existing World Models (WMs) ar… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  2. arXiv:2604.08168  [pdf, ps, other

    cs.RO cs.AI

    ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

    Authors: Jindi Lv, Hao Li, Jie Li, Yifei Nie, Fankun Kong, Yang Wang, Xiaofeng Wang, Zheng Zhu, Chaojun Ni, Qiuping Deng, Hengtao Li, Jiancheng Lv, Guan Huang

    Abstract: Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. However, existing value models built on vision-language models (VLMs) struggle to cap… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  3. arXiv:2604.07882  [pdf, ps, other

    cs.CV

    ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

    Authors: Boyuan Wang, Xiaofeng Wang, Yongkang Li, Zheng Zhu, Yifan Chang, Angen Ye, Guosheng Zhao, Chaojun Ni, Guan Huang, Yijie Ren, Yueqi Duan, Xingang Wang

    Abstract: Reconstructing non-rigid objects with physical plausibility remains a significant challenge. Existing approaches leverage differentiable rendering for per-scene optimization, recovering geometry and dynamics but requiring expensive tuning or manual annotation, which limits practicality and generalizability. To address this, we propose ReconPhys, the first feedforward framework that jointly learns… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  4. arXiv:2604.00014  [pdf, ps, other

    cs.CL cs.HC

    Disentangling Prompt Element Level Risk Factors for Hallucinations and Omissions in Mental Health LLM Responses

    Authors: Congning Ni, Sarvech Qadir, Bryan Steitz, Mihir Sachin Vaidya, Qingyuan Song, Lantian Xia, Shelagh Mulvaney, Siru Liu, Hyeyoung Ryu, Leah Hecht, Amy Bucher, Christopher Symons, Laurie Novak, Susannah L. Rose, Murat Kantarcioglu, Bradley Malin, Zhijun Yin

    Abstract: Mental health concerns are often expressed outside clinical settings, including in high-distress help seeking, where safety-critical guidance may be needed. Consumer health informatics systems increasingly incorporate large language models (LLMs) for mental health question answering, yet many evaluations underrepresent narrative, high-distress inquiries. We introduce UTCO (User, Topic, Context, To… ▽ More

    Submitted 10 March, 2026; originally announced April 2026.

    Comments: Submitted to AMIA 2026 Annual Symposium (under review)

  5. arXiv:2603.29045  [pdf, ps, other

    cs.CV

    Let the Abyss Stare Back Adaptive Falsification for Autonomous Scientific Discovery

    Authors: Peiran Li, Fangzhou Lin, Shuo Xing, Jiashuo Sun, Dylan Zhang, Siyuan Yang, Chaoqun Ni, Zhengzhong Tu

    Abstract: Autonomous scientific discovery is entering a more dangerous regime: once the evaluator is frozen, a sufficiently strong search process can learn to win the exam without learning the mechanism the task was meant to reveal. This is the idea behind our title. To let the abyss stare back is to make evaluation actively push against the candidate through adaptive falsification, rather than passively ce… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: 15 pages, 1 figures, 4 tables

  6. arXiv:2603.17240  [pdf, ps, other

    cs.CV

    GigaWorld-Policy: An Efficient Action-Centered World--Action Model

    Authors: Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Hao Li, Hengtao Li, Jie Li, Jindi Lv, Jingyu Liu, Min Cao, Peng Li, Qiuping Deng, Wenjun Mei, Xiaofeng Wang, Xinze Chen, Xinyu Zhou, Yang Wang, Yifan Chang, Yifan Li, Yukun Zhou, Yun Ye, Zhichao Liu, Zheng Zhu

    Abstract: World-Action Models (WAM) initialized from pre-trained video generation backbones have demonstrated remarkable potential for robot policy learning. However, existing approaches face two critical bottlenecks that hinder performance and deployment. First, jointly reasoning over future visual dynamics and corresponding actions incurs substantial inference overhead. Second, joint modeling often entang… ▽ More

    Submitted 21 March, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

    Comments: Added references

  7. arXiv:2603.10494  [pdf, ps, other

    cs.CL cs.LG

    VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

    Authors: Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu, Bradley A. Malin, Zhijun Yin

    Abstract: Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can encourage omissions ("say-less" degeneration). We introduce VERI-DPO, which uses claim verification to mine preferences and distill them into the summarizer with Direct Preference Optimization (DPO). On MI… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: Paper submitted to AMIA 2026 Annual Symposium

  8. arXiv:2603.08999  [pdf, ps, other

    cs.CL

    Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

    Authors: Juming Xiong, Kevin Guo, Congning Ni, Chao Yan, Katherine Brown, Avinash Baidya, Xiang Gao, Bradley Malin, Zhijun Yin

    Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper i… ▽ More

    Submitted 17 March, 2026; v1 submitted 9 March, 2026; originally announced March 2026.

  9. arXiv:2603.05517  [pdf, ps, other

    cs.LG cs.AI cs.CR cs.SE

    Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents

    Authors: Peiran Li, Jiashuo Sun, Fangzhou Lin, Shuo Xing, Tianfu Fu, Suofei Feng, Chaoqun Ni, Zhengzhong Tu

    Abstract: Autonomous LLM agents fail because long-horizon policy remains implicit in model weights and transcripts, while safety is retrofitted post hoc. We propose Traversal-as-Policy: distill sandboxed OpenHands execution logs into a single executable Gated Behavior Tree (GBT) and treat tree traversal -- rather than unconstrained generation -- as the control policy whenever a task is in coverage. Each nod… ▽ More

    Submitted 30 January, 2026; originally announced March 2026.

    Comments: 30 pages, 1 figurres, 23 tables

  10. arXiv:2603.01449  [pdf, ps, other

    eess.IV cs.CV

    Revisiting Global Token Mixing in Task-Dependent MRI Restoration: Insights from Minimal Gated CNN Baselines

    Authors: Xiangjian Hou, Chao Qin, Chang Ni, Xin Wang, Chun Yuan, Xiaodong Ma

    Abstract: Global token mixing, implemented via self-attention or state-space sequence models, has become a popular model design choice for MRI restoration. However, MRI restoration tasks differ substantially in how their degradations vary over image and k-space domains, and in the degree to which global coupling is already imposed by physics-driven data consistency terms. In this work, we ask the question w… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

  11. arXiv:2602.20167  [pdf, ps, other

    cs.CY

    Playsemble: Learning Low-Level Programming Through Interactive Games

    Authors: Elliott Wen, Paul Denny, Andrew Luxton-Reilly, Sean Ma, Bruce Sham, Chenye Ni, Jun Seo, Yu Yang

    Abstract: Teaching assembly programming is a fundamental component of undergraduate computer science education, yet many students struggle with its abstract and low-level concepts. Existing learning tools, such as simulators and visualisers, support understanding by exposing machine states. However, they often limit students to passive observation and provide few opportunities for meaningful interaction. To… ▽ More

    Submitted 27 February, 2026; v1 submitted 9 February, 2026; originally announced February 2026.

  12. arXiv:2602.12099  [pdf, ps, other

    cs.CV

    GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

    Authors: GigaBrain Team, Boyuan Wang, Bohan Li, Chaojun Ni, Guan Huang, Guosheng Zhao, Hao Li, Jie Li, Jindi Lv, Jingyu Liu, Lv Feng, Mingming Yu, Peng Li, Qiuping Deng, Tianze Liu, Xinyu Zhou, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yifei Nie, Yilong Li, Yukun Zhou, Yun Ye, Zhichao Liu , et al. (1 additional authors not shown)

    Abstract: Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enh… ▽ More

    Submitted 26 February, 2026; v1 submitted 12 February, 2026; originally announced February 2026.

    Comments: https://gigabrain05m.github.io/

  13. arXiv:2601.23009  [pdf, ps, other

    cs.SE

    SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation

    Authors: Wei Chen, Zhiyuan Peng, Xin Yin, Chao Ni, Chenhao Ying, Bang Xie, Yuan Luo

    Abstract: Smart contracts are the backbone of the decentralized web, yet ensuring their functional correctness and security remains a critical challenge. While Large Language Models (LLMs) have shown promise in code generation, they often struggle with the rigorous requirements of smart contracts, frequently producing code that is buggy or vulnerable. To address this, we propose SolAgent, a novel tool-augme… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  14. arXiv:2601.16993  [pdf, ps, other

    cs.DL cs.AI

    BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature

    Authors: Peiran Li, Fangzhou Lin, Shuo Xing, Xiang Zheng, Xi Hong, Siyuan Yang, Jiashuo Sun, Zhengzhong Tu, Chaoqun Ni

    Abstract: Citations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets… ▽ More

    Submitted 30 January, 2026; v1 submitted 12 January, 2026; originally announced January 2026.

  15. arXiv:2512.04111  [pdf, ps, other

    cs.SE cs.AI cs.HC

    HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding

    Authors: Hanjun Luo, Chiming Ni, Jiaheng Wen, Zhimu Huang, Yiran Wang, Bingduo Liao, Sylvia Chung, Yingbin Jin, Xinfeng Li, Wenyuan Xu, XiaoFeng Wang, Hanan Salam

    Abstract: LLM-powered coding agents are reshaping the development paradigm. However, existing evaluation systems, neither traditional tests for humans nor benchmarks for LLMs, fail to capture this shift. They remain focused on well-defined algorithmic problems, which excludes problems where success depends on human-AI collaboration. Such collaborative problems not only require human reasoning to interpret c… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  16. arXiv:2512.02284  [pdf, ps, other

    quant-ph cs.ET

    Quantum-Classical Separation in Bounded-Resource Tasks Arising from Measurement Contextuality

    Authors: Shashwat Kumar, Eliott Rosenberg, Alejandro Grajales Dau, Rodrigo Cortinas, Dmitri Maslov, Richard Oliver, Adam Zalcman, Matthew Neeley, Alice Pagano, Aaron Szasz, Ilya Drozdov, Zlatko Minev, Craig Gidney, Noureldin Yosri, Stijn J. de Graaf, Aniket Maiti, Dmitry Abanin, Rajeev Acharya, Laleh Aghababaie Beni, Georg Aigeldinger, Ross Alcaraz, Sayra Alcaraz, Trond I. Andersen, Markus Ansmann, Frank Arute , et al. (258 additional authors not shown)

    Abstract: The prevailing view is that quantum phenomena can be harnessed to tackle certain problems beyond the reach of classical approaches. Quantifying this capability as a quantum-classical separation and demonstrating it on current quantum processors has remained elusive. Using a superconducting qubit processor, we show that quantum contextuality enables certain tasks to be performed with success probab… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  17. arXiv:2512.00903  [pdf, ps, other

    cs.CV cs.RO

    SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

    Authors: Chaojun Ni, Cheng Chen, Xiaofeng Wang, Zheng Zhu, Wenzhao Zheng, Boyuan Wang, Tianrun Chen, Guosheng Zhao, Haoyun Li, Zhehao Dong, Qiang Zhang, Yun Ye, Yang Wang, Guan Huang, Wenjun Mei

    Abstract: Vision-Language-Action (VLA) models built on pretrained Vision-Language Models (VLMs) show strong potential but are limited in practicality due to their large parameter counts. To mitigate this issue, using a lightweight VLM has been explored, but it compromises spatiotemporal reasoning. Although some methods suggest that incorporating additional 3D inputs can help, they usually rely on large VLMs… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  18. arXiv:2511.19861  [pdf, ps, other

    cs.CV cs.RO

    GigaWorld-0: World Models as Data Engine to Empower Embodied AI

    Authors: GigaWorld Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jiagang Zhu, Kerui Li, Mengyuan Xu, Qiuping Deng, Siting Wang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yankai Wang, Yu Cao, Yifan Chang, Yuan Xu, Yun Ye, Yang Wang, Yukun Zhou, Zhengyuan Zhang, Zhehao Dong, Zheng Zhu

    Abstract: World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and te… ▽ More

    Submitted 30 November, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

    Comments: Project Page: https://giga-world-0.github.io/

  19. arXiv:2511.15872  [pdf

    cs.DL cs.CY physics.soc-ph

    AI-Assisted Writing Is Growing Fastest Among Non-English-Speaking and Less Established Scientists

    Authors: Jialin Liu, Yongyuan He, Zhihan Zheng, Yi Bu, Chaoqun Ni

    Abstract: The dominance of English in global science has long created significant barriers for non-native speakers. The recent emergence of generative artificial intelligence (GenAI) dramatically reduces drafting and revision costs, but, simultaneously, raises a critical question: how is the technology being adopted by the global scientific community, and is it mitigating existing inequities? This study pro… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  20. arXiv:2511.11332  [pdf, ps, other

    cs.DC cs.MA

    UFO3: Weaving the Digital Agent Galaxy

    Authors: Chaoyun Zhang, Liqun Li, He Huang, Chiming Ni, Bo Qiao, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Large language model (LLM)-powered agents are transforming digital devices from passive tools into proactive intelligent collaborators. However, most existing frameworks remain confined to a single OS or device, making cross-device workflows brittle and largely manual. We present UFO$^3$, a system that unifies heterogeneous endpoints, desktops, servers, mobile devices, and edge, into a single orch… ▽ More

    Submitted 1 March, 2026; v1 submitted 14 November, 2025; originally announced November 2025.

    Comments: We developed UFO$^3$ as a fully engineered system with over 73K lines of code, encompassing agent implementations and integrations for Windows, Linux, and Android mobile devices. The entire project is open-sourced at https://github.com/microsoft/UFO/, accompanied by detailed documentation and tutorials at https://microsoft.github.io/UFO/

  21. arXiv:2511.04307  [pdf, ps, other

    cs.AI

    GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents

    Authors: Jian Mu, Chaoyun Zhang, Chiming Ni, Lu Wang, Bo Qiao, Kartik Mathur, Qianhui Wu, Yuhang Xie, Xiaojun Ma, Mengyu Zhou, Si Qin, Liqun Li, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: We introduce GUI-360$^\circ$, a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs). CUAs present unique challenges and is constrained by three persistent gaps: a scarcity of real-world CUA tasks, the lack of automated collection-and-annotation pipelines for multi-modal trajectories, and the absence of a unified benchmark that jointly evaluates G… ▽ More

    Submitted 10 November, 2025; v1 submitted 6 November, 2025; originally announced November 2025.

  22. Using language models to label clusters of scientific documents

    Authors: Dakota Murray, Chaoqun Ni, Weiye Gu, Trevor Hubbard

    Abstract: Automated label generation for clusters of scientific documents is a common task in bibliometric workflows. Traditionally, labels were formed by concatenating distinguishing characteristics of a cluster's documents; while straightforward, this approach often produces labels that are terse and difficult to interpret. The advent and widespread accessibility of generative language models, such as Cha… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

    Comments: 36 pages, 2 figures

  23. arXiv:2510.19430  [pdf, ps, other

    cs.RO cs.CV

    GigaBrain-0: A World Model-Powered Vision-Language-Action Model

    Authors: GigaBrain Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jie Li, Jiagang Zhu, Lv Feng, Peng Li, Qiuping Deng, Runqi Ouyang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yilong Li, Yiran Ding, Yuan Xu, Yun Ye, Yukun Zhou, Zhehao Dong, Zhenan Wang , et al. (2 additional authors not shown)

    Abstract: Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by worl… ▽ More

    Submitted 4 December, 2025; v1 submitted 22 October, 2025; originally announced October 2025.

    Comments: https://gigabrain0.github.io/

  24. arXiv:2510.15264  [pdf, ps, other

    cs.CV

    DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

    Authors: Weijie Wang, Jiagang Zhu, Zeyu Zhang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Haoxiao Wang, Guan Huang, Xinze Chen, Yukun Zhou, Wenkang Qin, Duochao Shi, Haoyun Li, Yicheng Xiao, Donny Y. Chen, Jiwen Lu

    Abstract: We present DriveGen3D, a novel framework for generating high-quality and highly controllable dynamic 3D driving scenes that addresses critical limitations in existing methodologies. Current approaches to driving scene synthesis either suffer from prohibitive computational demands for extended temporal generation, focus exclusively on prolonged video synthesis without 3D representation, or restrict… ▽ More

    Submitted 29 December, 2025; v1 submitted 16 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS Workshop on Next Practices in Video Generation and Evaluation (Short Paper Track), Project Page: https://lhmd.top/drivegen3d

  25. arXiv:2510.13293  [pdf, ps, other

    cs.CL

    Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models

    Authors: Yizhou Peng, Yukun Ma, Chong Zhang, Yi-Wen Chao, Chongjia Ni, Bin Ma

    Abstract: While Text-to-Speech (TTS) systems can achieve fine-grained control over emotional expression via natural language prompts, a significant challenge emerges when the desired emotion (style prompt) conflicts with the semantic content of the text. This mismatch often results in unnatural-sounding speech, undermining the goal of achieving fine-grained emotional control. Classifier-Free Guidance (CFG)… ▽ More

    Submitted 8 April, 2026; v1 submitted 15 October, 2025; originally announced October 2025.

    Comments: Re-submitted to Interspeech 2026 (with updates) -- Updates to be released upon approval

  26. arXiv:2509.25149  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Pretraining Large Language Models with NVFP4

    Authors: NVIDIA, Felix Abecassis, Anjulie Agrusa, Dong Ahn, Jonah Alben, Stefania Alborghetti, Michael Andersch, Sivakumar Arayandi, Alexis Bjorlin, Aaron Blakeman, Evan Briones, Ian Buck, Bryan Catanzaro, Muya Chang, Jinhang Choi, Mike Chrzanowski, Eric Chung, Victor Cui, Steve Dai, Bita Darvish Rouhani, Carlo del Mundo, Deena Donia, Burc Eryilmaz, Henry Estela, Abhinav Goel , et al. (65 additional authors not shown)

    Abstract: Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive research and experimentation across the industry. Training a frontier model today requires on the order of tens to hundreds of yottaflops, which is a massive investment of time, compute… ▽ More

    Submitted 4 March, 2026; v1 submitted 29 September, 2025; originally announced September 2025.

    Comments: Update includes: (1) fixing a typo in eq. 2 (2) updating author list, and (3) adding a related work

  27. arXiv:2509.23812  [pdf, ps, other

    cs.SE cs.AI

    Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models

    Authors: Dianshu Liao, Xin Yin, Shidong Pan, Chao Ni, Zhenchang Xing, Xiaoyu Sun

    Abstract: Unit testing is essential for software quality assurance, yet writing and maintaining tests remains time-consuming and error-prone. To address this challenge, researchers have proposed various techniques for automating unit test generation, including traditional heuristic-based methods and more recent approaches that leverage large language models (LLMs). However, these existing approaches are inh… ▽ More

    Submitted 11 October, 2025; v1 submitted 28 September, 2025; originally announced September 2025.

  28. arXiv:2509.22407  [pdf, ps, other

    cs.AI cs.RO

    EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

    Authors: Zhehao Dong, Xiaofeng Wang, Zheng Zhu, Yirui Wang, Yang Wang, Yukun Zhou, Boyuan Wang, Chaojun Ni, Runqi Ouyang, Wenkang Qin, Xinze Chen, Yun Ye, Guan Huang, Zhen Lu, Yue Yang

    Abstract: The generalization of vision-language-action (VLA) models heavily relies on diverse training data. However, acquiring large-scale data for robot manipulation across varied object appearances is costly and labor-intensive. To address this limitation, we introduce Embodied Manipulation Media Adaptation (EMMA), a framework for augmenting VLA policies that combines a generative data engine with an eff… ▽ More

    Submitted 16 March, 2026; v1 submitted 26 September, 2025; originally announced September 2025.

  29. arXiv:2509.22199  [pdf, ps, other

    cs.RO cs.AI

    MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

    Authors: Haoyun Li, Ivan Zhang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Zhiqin Yang, Zhentao Zhang, Boyuan Wang, Chaojun Ni, Wenkang Qin, Xinze Chen, Yun Ye, Guan Huang, Zhenbo Song, Xingang Wang

    Abstract: Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm their effectiveness in training VLA models. However, a significant domain gap persists between hu… ▽ More

    Submitted 29 September, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

  30. arXiv:2509.12508  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Fun-ASR Technical Report

    Authors: Keyu An, Yanni Chen, Zhigao Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Ying Liu, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma, Haoneng Luo, Chongjia Ni, Zexu Pan, Yiping Peng, Zhendong Peng, Peiyao Wang, Hao Wang, Haoxu Wang, Wen Wang, Wupeng Wang , et al. (13 additional authors not shown)

    Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present Fun-ASR, a large-scale, LLM… ▽ More

    Submitted 19 December, 2025; v1 submitted 15 September, 2025; originally announced September 2025.

    Comments: Authors are listed in alphabetical order. Work in progress

  31. arXiv:2508.17720  [pdf, ps, other

    cs.SE

    RepoTransAgent: Multi-Agent LLM Framework for Repository-Aware Code Translation

    Authors: Ziqi Guan, Xin Yin, Zhiyuan Peng, Chao Ni

    Abstract: Repository-aware code translation is critical for modernizing legacy systems, enhancing maintainability, and enabling interoperability across diverse programming languages. While recent advances in large language models (LLMs) have improved code translation quality, existing approaches face significant challenges in practical scenarios: insufficient contextual understanding, inflexible prompt desi… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  32. arXiv:2508.08170  [pdf, ps, other

    cs.CV

    ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

    Authors: Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Xinze Chen, Guanghong Jia, Guan Huang, Wenjun Mei

    Abstract: Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a sim… ▽ More

    Submitted 21 August, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

  33. arXiv:2507.20888  [pdf, ps, other

    cs.SE cs.CL

    Enhancing Project-Specific Code Completion by Inferring Internal API Information

    Authors: Le Deng, Xiaoxue Ren, Chao Ni, Ming Liang, David Lo, Zhongxin Liu

    Abstract: Project-specific code completion is a critical task that leverages context from a project to generate accurate code. State-of-the-art methods use retrieval-augmented generation (RAG) with large language models (LLMs) and project information for code completion. However, they often struggle to incorporate internal API information, which is crucial for accuracy, especially when APIs are not explicit… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  34. arXiv:2507.20109  [pdf, ps, other

    cs.SE cs.AI

    Learning to Align Human Code Preferences

    Authors: Xin Yin, Chao Ni, Xiaohu Yang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in automating software development tasks. While recent advances leverage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to align models with human preferences, the optimal training strategy remains unclear across diverse code preference scenarios. This paper systematically investigates the roles of SFT and D… ▽ More

    Submitted 8 December, 2025; v1 submitted 26 July, 2025; originally announced July 2025.

  35. arXiv:2507.19040  [pdf, ps, other

    eess.AS cs.CL

    FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems

    Authors: Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng

    Abstract: Full-duplex spoken dialogue systems (FDSDS) enable more natural human-machine interactions by allowing real-time user interruptions and backchanneling, compared to traditional SDS that rely on turn-taking. However, existing benchmarks lack metrics for FD scenes, e.g., evaluating model performance during user interruptions. In this paper, we present a comprehensive FD benchmarking pipeline utilizin… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

    Comments: Accepted to Interspeech 2025. 5 pages

  36. arXiv:2507.13123  [pdf, ps, other

    cs.SE

    Detecting LLM-generated Code with Subtle Modification by Adversarial Training

    Authors: Xin Yin, Xinrui Li, Chao Ni, Xiaodan Xu, Xiaohu Yang

    Abstract: With the rapid development of Large Language Models (LLMs), their powerful code-generation capabilities have been widely applied in tasks like code completion and automated development, demonstrating the value of improving coding efficiency. However, the extensive use of LLM-generated code also raises several new challenges. On the one hand, issues such as the regulation of code provenance, copyri… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  37. arXiv:2507.12366  [pdf, ps, other

    cs.SC cs.AI cs.CV

    FactorHD: A Hyperdimensional Computing Model for Multi-Object Multi-Class Representation and Factorization

    Authors: Yifei Zhou, Xuchu Huang, Chenyu Ni, Min Zhou, Zheyu Yan, Xunzhao Yin, Cheng Zhuo

    Abstract: Neuro-symbolic artificial intelligence (neuro-symbolic AI) excels in logical analysis and reasoning. Hyperdimensional Computing (HDC), a promising brain-inspired computational model, is integral to neuro-symbolic AI. Various HDC models have been proposed to represent class-instance and class-class relations, but when representing the more complex class-subclass relation, where multiple objects ass… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: 7 pages, 5 figures, 2 tables, to be published in the 62nd DAC (Design Automation Conference) proceedings

  38. arXiv:2507.05198  [pdf, ps, other

    cs.RO cs.AI cs.CV

    EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling

    Authors: Boyuan Wang, Xinpan Meng, Xiaofeng Wang, Zheng Zhu, Angen Ye, Yang Wang, Zhiqin Yang, Chaojun Ni, Guan Huang, Xingang Wang

    Abstract: The rapid advancement of Embodied AI has led to an increasing demand for large-scale, high-quality real-world data. However, collecting such embodied data remains costly and inefficient. As a result, simulation environments have become a crucial surrogate for training robot policies. Yet, the significant Real2Sim2Real gap remains a critical bottleneck, particularly in terms of physical dynamics an… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Project Page: https://embodiedreamer.github.io/

  39. arXiv:2506.20590  [pdf, ps, other

    cs.CV

    WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration

    Authors: Chaojun Ni, Jie Li, Haoyun Li, Hengyu Liu, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Boyuan Wang, Chenxin Li, Guan Huang, Wenjun Mei

    Abstract: Interactive 3D scene generation from a single image has gained significant attention due to its potential to create immersive virtual worlds. However, a key challenge in current 3D generation methods is the limited explorability, which cannot render high-quality images during larger maneuvers beyond the original viewpoint, particularly when attempting to move forward into unseen areas. To address… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  40. arXiv:2506.17211  [pdf, ps, other

    cs.LG

    BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning

    Authors: Xuechen Zhang, Zijian Huang, Yingcong Li, Chenshun Ni, Jiasi Chen, Samet Oymak

    Abstract: Small language models (SLMs) struggle to learn complex reasoning behaviors, especially when high-quality traces are scarce or difficult to learn from. The standard training approach combines a supervised fine-tuning (SFT) stage, often to distill capabilities of a larger model, followed by a reinforcement learning (RL)stage such as Group Relative Policy Optimization (GRPO). In this paper, we invest… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  41. arXiv:2506.03006  [pdf, ps, other

    cs.SE

    A Preference-Driven Methodology for High-Quality Solidity Code Generation

    Authors: Zhiyuan Peng, Xin Yin, Chenhao Ying, Chao Ni, Yuan Luo

    Abstract: While Large Language Models (LLMs) have demonstrated remarkable progress in generating functionally correct Solidity code, they continue to face critical challenges in producing gas-efficient and secure code, which are critical requirements for real-world smart contract deployment. Although recent advances leverage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) for code pref… ▽ More

    Submitted 30 September, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  42. arXiv:2506.00641  [pdf, ps, other

    cs.AI

    AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents

    Authors: Hanjun Luo, Shenyu Dai, Chiming Ni, Xinfeng Li, Guibin Zhang, Kun Wang, Tongliang Liu, Hanan Salam

    Abstract: Despite the rapid advancement of LLM-based agents, the reliable evaluation of their safety and security remains a significant challenge. Existing rule-based or LLM-based evaluators often miss dangers in agents' step-by-step actions, overlook subtle meanings, fail to see how small issues compound, and get confused by unclear safety or security rules. To overcome this evaluation crisis, we introduce… ▽ More

    Submitted 30 January, 2026; v1 submitted 31 May, 2025; originally announced June 2025.

    Comments: This paper is accepted by 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

  43. arXiv:2505.17589  [pdf, ps, other

    cs.SD cs.AI eess.AS

    CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

    Authors: Zhihao Du, Changfeng Gao, Yuxuan Wang, Fan Yu, Tianyu Zhao, Hao Wang, Xiang Lv, Hui Wang, Chongjia Ni, Xian Shi, Keyu An, Guanrou Yang, Yabin Li, Yanni Chen, Zhifu Gao, Qian Chen, Yue Gu, Mengzhe Chen, Yafeng Chen, Shiliang Zhang, Wen Wang, Jieping Ye

    Abstract: In our prior works, we introduced a scalable streaming speech synthesis model, CosyVoice 2, which integrates a large language model (LLM) and a chunk-aware flow matching (FM) model, and achieves low-latency bi-streaming speech synthesis and human-parity quality. Despite these advancements, CosyVoice 2 exhibits limitations in language coverage, domain diversity, data volume, text formats, and post-… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Preprint, work in progress

  44. arXiv:2505.07961  [pdf, ps, other

    cs.LG

    Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

    Authors: Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak

    Abstract: Recent research enhances language model reasoning by scaling test-time compute via longer chain-of-thought traces. This often improves accuracy but also introduces redundancy and high computational cost, especially for small language models distilled with supervised fine-tuning (SFT). In this work, we propose new algorithms to improve token-efficient reasoning with small-scale models by effectivel… ▽ More

    Submitted 23 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  45. arXiv:2504.14603  [pdf, other

    cs.AI cs.HC cs.OS

    UFO2: The Desktop AgentOS

    Authors: Chaoyun Zhang, He Huang, Chiming Ni, Jian Mu, Si Qin, Shilin He, Lu Wang, Fangkai Yang, Pu Zhao, Chao Du, Liqun Li, Yu Kang, Zhao Jiang, Suzhen Zheng, Rujia Wang, Jiaxu Qian, Minghua Ma, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows deskto… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: The source code of UFO2 is publicly available at https://github.com/microsoft/UFO/, with comprehensive documentation provided at https://microsoft.github.io/UFO/

  46. arXiv:2504.03536  [pdf, ps, other

    cs.CV

    HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

    Authors: Boyuan Wang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Xiaopei Zhang, Guan Huang, Yijie Ren, Lihong Liu, Xingang Wang

    Abstract: Single-image human reconstruction is vital for digital human modeling applications but remains an extremely challenging task. Current approaches rely on generative models to synthesize multi-view images for subsequent 3D reconstruction and animation. However, directly generating multiple views from a single human image suffers from geometric inconsistencies, resulting in issues like fragmented or… ▽ More

    Submitted 12 November, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: Project Page: https://humandreamer-x.github.io/

  47. arXiv:2504.02261  [pdf, other

    cs.CV

    WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

    Authors: Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei

    Abstract: Interactive 3D generation is gaining momentum and capturing extensive attention for its potential to create immersive virtual experiences. However, a critical challenge in current 3D generation technologies lies in achieving real-time interactivity. To address this issue, we introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspective… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Project Page: https://wonderturbo.github.io

  48. arXiv:2503.24026  [pdf, other

    cs.CV

    HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

    Authors: Boyuan Wang, Xiaofeng Wang, Chaojun Ni, Guosheng Zhao, Zhiqin Yang, Zheng Zhu, Muyang Zhang, Yukun Zhou, Xinze Chen, Guan Huang, Lihong Liu, Xingang Wang

    Abstract: Human-motion video generation has been a challenging task, primarily due to the difficulty inherent in learning human body movements. While some approaches have attempted to drive human-centric video generation explicitly through pose control, these methods typically rely on poses derived from existing videos, thereby lacking flexibility. To address this, we propose HumanDreamer, a decoupled human… ▽ More

    Submitted 31 March, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: Project Page: https://humandreamer.github.io

  49. arXiv:2503.21912  [pdf

    cs.CY cs.DL

    Interdisciplinary PhDs face barriers to top university placement within their disciplines

    Authors: Xiang Zheng, Anli Peng, Xi Hong, Cassidy R. Sugimoto, Chaoqun Ni

    Abstract: Interdisciplinary research has gained prominence as a necessity for addressing complex challenges, yet its impact on early academic careers remains unclear. This study examines how interdisciplinarity during doctoral training influences faculty placement at top universities across diverse fields. Analyzing the career trajectories of over 30,000 tenure-track faculty members who earned their Ph.D. d… ▽ More

    Submitted 5 November, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  50. arXiv:2503.18438  [pdf, ps, other

    cs.CV

    ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

    Authors: Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang

    Abstract: Combining reconstruction models with generative models has emerged as a promising paradigm for closed-loop simulation in autonomous driving. For example, ReconDreamer has demonstrated remarkable success in rendering large-scale maneuvers. However, a significant gap remains between the generated data and real-world sensor observations, particularly in terms of fidelity for structured elements, such… ▽ More

    Submitted 10 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Project Page: https://recondreamer-plus.github.io/