Skip to main content

Showing 1–50 of 80 results for author: Liang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.05529  [pdf, ps, other

    cs.AI

    ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

    Authors: Chenjie Yang, Yutian Jiang, Anqi Liang, Wei Qi, Chenyu Wu, Junbo Zhang

    Abstract: Human mobility modeling is indispensable for diverse urban applications. However, existing data-driven methods often suffer from data scarcity, limiting their applicability in regions where historical trajectories are unavailable or restricted. To bridge this gap, we propose \textbf{ActivityEditor}, a novel dual-LLM-agent framework designed for zero-shot cross-regional trajectory generation. Our f… ▽ More

    Submitted 10 April, 2026; v1 submitted 7 April, 2026; originally announced April 2026.

  2. arXiv:2604.01086  [pdf, ps, other

    cs.DS cs.IT math.ST

    Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

    Authors: Guokai Li, Alys Liang, Mo Liu, Murray Lei, Stefanus Jasin, Fenghua Yang, Preet Baxi

    Abstract: We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry in… ▽ More

    Submitted 1 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

  3. arXiv:2603.22642  [pdf

    cs.CL

    Multi-Method Validation of Large Language Model Medical Translation Across High- and Low-Resource Languages

    Authors: Chukwuebuka Anyaegbuna, Eduardo Juan Perez Guerrero, Jerry Liu, Timothy Keyes, April Liang, Natasha Steele, Stephen Ma, Jonathan Chen, Kevin Schulman

    Abstract: Language barriers affect 27.3 million U.S. residents with non-English language preference, yet professional medical translation remains costly and often unavailable. We evaluated four frontier large language models (GPT-5.1, Claude Opus 4.5, Gemini 3 Pro, Kimi K2) translating 22 medical documents into 8 languages spanning high-resource (Spanish, Chinese, Russian, Vietnamese), medium-resource (Kore… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: 32 references, 5 tables, 2 figures

    ACM Class: J.3; I.2.7

  4. arXiv:2603.17234  [pdf

    cs.CY cs.AI

    Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients

    Authors: Jane Wang, Timothy Keyes, April S Liang, Stephen P Ma, Jason Shen, Jerry Liu, Nerissa Ambers, Abby Pandya, Rita Pandya, Jason Hom, Natasha Steele, Jonathan H Chen, Kevin Schulman

    Abstract: Surgical co-management (SCM) is an evidence-based model in which hospitalists jointly manage medically complex perioperative patients alongside surgical teams. Despite its clinical and financial value, SCM is limited by the need to manually identify eligible patients. To determine whether SCM triage can be automated, we conducted a prospective, unblinded study at Stanford Health Care in which an L… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 35 pages, 4 figures, 5 tables

  5. arXiv:2603.15359  [pdf, ps, other

    cs.RO

    NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation

    Authors: Tianshuai Hu, Zeying Gong, Lingdong Kong, XiaoDong Mei, Yiyi Ding, Qi Zeng, Ao Liang, Rong Li, Yangyi Zhong, Junwei Liang

    Abstract: Social navigation requires robots to act safely in dynamic human environments. Effective behavior demands thinking ahead: reasoning about how the scene and pedestrians evolve under different robot actions rather than reacting to current observations alone. This creates a coupled prediction-planning challenge, where robot actions and human motion mutually influence each other. To address this chall… ▽ More

    Submitted 18 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  6. arXiv:2603.14158  [pdf

    cs.HC cs.LG

    Clinician input steers frontier AI models toward both accurate and harmful decisions

    Authors: Ivan Lopez, Selin S. Everett, Bryan J. Bunning, April S. Liang, Dong Han Yao, Shivam C. Vedak, Kameron C. Black, Sophie Ostmeier, Stephen P. Ma, Emily Alsentzer, Jonathan H. Chen, Akshay S. Chaudhari, Eric Horvitz

    Abstract: Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during clinical interactions. We combined 61 New England Journal of Medicine Case Records with 92 real-world clinician-AI interactions to evaluate 21 reasoning LLM variants across 8 frontier models on differential diagnosis generation and next step recommendat… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  7. arXiv:2603.02115  [pdf, ps, other

    cs.RO cs.AI cs.LG

    Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

    Authors: Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer, Dieter Fox, Yu Xiang, Anqi Li, Andreea Bobu, Abhishek Gupta, Stephen Tu, Erdem Biyik, Jesse Zhang

    Abstract: General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 33 pages, 17 figures

  8. arXiv:2602.12270  [pdf, ps, other

    econ.TH cs.AI cs.GT

    Creative Ownership in the Age of AI

    Authors: Annie Liang, Jay Lu

    Abstract: Copyright law focuses on whether a new work is "substantially similar" to an existing one, but generative AI can closely imitate style without copying content, a capability now central to ongoing litigation. We argue that existing definitions of infringement are ill-suited to this setting and propose a new criterion: a generative AI output infringes on an existing work if it could not have been ge… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  9. arXiv:2602.00074  [pdf

    cs.CY cs.AI

    Adoption and Use of LLMs at an Academic Medical Center

    Authors: Nigam H. Shah, Nerissa Ambers, Abby Pandya, Timothy Keyes, Juan M. Banda, Srikar Nallan, Carlene Lugtu, Artem A. Trotsyuk, Suhana Bedi, Alyssa Unell, Miguel Fuentes, Francois Grolleau, Sneha S. Jain, Jonathan Chen, Devdutta Dash, Danton Char, Aditya Sharma, Duncan McElfresh, Patrick Scully, Vishanthan Kumar, Connor OBrien, Satchi Mouniswamy, Elvis Jones, Krishna Jasti, Gunavathi Mannika Lakshmanan , et al. (32 additional authors not shown)

    Abstract: While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in… ▽ More

    Submitted 20 January, 2026; originally announced February 2026.

  10. arXiv:2601.12343  [pdf, ps, other

    econ.EM cs.AI stat.ML

    How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

    Authors: Wayne Gao, Sukjin Han, Annie Liang

    Abstract: Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to th… ▽ More

    Submitted 18 January, 2026; originally announced January 2026.

  11. arXiv:2601.05014  [pdf, ps, other

    cs.RO

    The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

    Authors: Lingdong Kong, Shaoyuan Xie, Zeying Gong, Ye Li, Meng Chu, Ao Liang, Yuhao Dong, Tianshuai Hu, Ronghe Qiu, Rong Li, Hanjiang Hu, Dongyue Lu, Wei Yin, Wenhao Ding, Linfeng Li, Hang Song, Wenwei Zhang, Yuexin Ma, Junwei Liang, Zhedong Zheng, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi, Ziwei Liu, Zhanpeng Zhang , et al. (114 additional authors not shown)

    Abstract: Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2… ▽ More

    Submitted 8 January, 2026; originally announced January 2026.

    Comments: Official IROS 2025 RoboSense Challenge Report; 51 pages, 37 figures, 5 tables; Competition Website at https://robosense2025.github.io/

  12. arXiv:2512.16760  [pdf, ps, other

    cs.RO

    Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

    Authors: Tianshuai Hu, Xiaolu Liu, Song Wang, Yiyao Zhu, Ao Liang, Lingdong Kong, Guoyang Zhao, Zeying Gong, Jun Cen, Zhiyu Huang, Xiaoshuai Hao, Linfeng Li, Hang Song, Xiangtai Li, Jun Ma, Shaojie Shen, Jianke Zhu, Dacheng Tao, Ziwei Liu, Junwei Liang

    Abstract: Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to… ▽ More

    Submitted 4 January, 2026; v1 submitted 18 December, 2025; originally announced December 2025.

    Comments: Survey; 47 pages, 7 figures, 9 tables; GitHub Repo at https://github.com/worldbench/awesome-vla-for-ad

  13. arXiv:2512.10958  [pdf, ps, other

    cs.CV

    WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

    Authors: Ao Liang, Lingdong Kong, Tianyi Yan, Hongsi Liu, Wesley Yang, Ziqi Huang, Wei Yin, Jialong Zuo, Yixuan Hu, Dekai Zhu, Dongyue Lu, Youquan Liu, Guangfeng Jiang, Linfeng Li, Xiangtai Li, Long Zhuo, Lai Xing Ng, Benoit R. Cottereau, Changxin Gao, Liang Pan, Wei Tsang Ooi, Ziwei Liu

    Abstract: Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally. Despite rapid progress, the field still lacks a unified way to assess whether generated worlds preserve geometry, obey physics, or support reliable control. We introduce WorldLens, a full-spectrum benchmark evaluating how… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Preprint; 80 pages, 37 figures, 29 tables; Project Page at https://worldbench.github.io/worldlens

  14. arXiv:2512.09016  [pdf, ps, other

    cs.CV

    Learning to Remove Lens Flare in Event Camera

    Authors: Haiqian Han, Lingdong Kong, Jianing Li, Ao Liang, Chengtao Zhu, Jiacheng Lyu, Lai Xing Ng, Xiangyang Ji, Wei Tsang Ooi, Benoit R. Cottereau

    Abstract: Event cameras have the potential to revolutionize vision systems with their high temporal resolution and dynamic range, yet they remain susceptible to lens flare, a fundamental optical artifact that causes severe degradation. In event streams, this optical artifact forms a complex, spatio-temporal distortion that has been largely overlooked. We present E-Deflare, the first systematic framework for… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: Preprint; 29 pages, 14 figures, 4 tables; Project Page at https://e-flare.github.io/

  15. arXiv:2512.04354  [pdf

    cs.LG cs.HC

    SmartAlert: Implementing Machine Learning-Driven Clinical Decision Support for Inpatient Lab Utilization Reduction

    Authors: April S. Liang, Fatemeh Amrollahi, Yixing Jiang, Conor K. Corbin, Grace Y. E. Kim, David Mui, Trevor Crowell, Aakash Acharya, Sreedevi Mony, Soumya Punnathanam, Jack McKeown, Margaret Smith, Steven Lin, Arnold Milstein, Kevin Schulman, Jason Hom, Michael A. Pfeffer, Tho D. Pham, David Svec, Weihan Chu, Lisa Shieh, Christopher Sharp, Stephen P. Ma, Jonathan H. Chen

    Abstract: Repetitive laboratory testing unlikely to yield clinically useful information is a common practice that burdens patients and increases healthcare costs. Education and feedback interventions have limited success, while general test ordering restrictions and electronic alerts impede appropriate clinical care. We introduce and evaluate SmartAlert, a machine learning (ML)-driven clinical decision supp… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: 22 pages, 5 figures

  16. arXiv:2512.03176  [pdf, ps, other

    cs.LG cs.AI

    Plantain: Plan-Answer Interleaved Reasoning

    Authors: Anthony Liang, Jonathan Berant, Adam Fisch, Abhimanyu Goyal, Kalpesh Krishna, Jacob Eisenstein

    Abstract: Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted w… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  17. arXiv:2512.02982  [pdf, ps, other

    cs.CV cs.RO

    U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

    Authors: Xiang Xu, Alan Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu

    Abstract: Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. I… ▽ More

    Submitted 24 March, 2026; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: CVPR 2026; 20 pages, 7 figures, 11 tables; Code at https://github.com/worldbench/U4D

  18. arXiv:2512.01241  [pdf

    cs.CY cs.AI

    First, do NOHARM: towards clinically safe large language models

    Authors: David Wu, Fateme Nateghi Haredasht, Saloni Kumar Maharaj, Priyank Jain, Jessica Tran, Matthew Gwiazdon, Arjun Rustagi, Jenelle Jindal, Jacob M. Koshy, Vinay Kadiyala, Anup Agarwal, Bassman Tappuni, Brianna French, Sirus Jesudasen, Christopher V. Cosgriff, Rebanta Chakraborty, Jillian Caldwell, Susan Ziolkowski, David J. Iberri, Robert Diep, Rahul S. Dalal, Kira L. Newman, Kristin Galetta, J. Carl Pallais, Nancy Wei , et al. (26 additional authors not shown)

    Abstract: Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a benchmark using 100 real primary care-to-specialist consultation cases to measure frequency and severity of harm from LLM-generated medical recommendations. NOHARM… ▽ More

    Submitted 17 December, 2025; v1 submitted 30 November, 2025; originally announced December 2025.

  19. arXiv:2511.13240  [pdf, ps, other

    cs.LG

    Knowing What You Know Is Not Enough: Large Language Model Confidences Don't Align With Their Actions

    Authors: Arka Pal, Teo Kitanovski, Arthur Liang, Akilesh Potti, Micah Goldblum

    Abstract: Large language models (LLMs) are increasingly deployed in agentic and multi-turn workflows where they are tasked to perform actions of significant consequence. In order to deploy them reliably and manage risky outcomes in these settings, it is helpful to access model uncertainty estimates. However, confidence elicitation methods for LLMs are typically not evaluated directly in agentic settings; in… ▽ More

    Submitted 8 February, 2026; v1 submitted 17 November, 2025; originally announced November 2025.

  20. arXiv:2511.01755  [pdf, ps, other

    cs.CV cs.RO

    3EED: Ground Everything Everywhere in 3D

    Authors: Rong Li, Yuhao Dong, Tianshuai Hu, Ao Liang, Youquan Liu, Dongyue Lu, Liang Pan, Lingdong Kong, Junwei Liang, Ziwei Liu

    Abstract: Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platform constraints, and small scale. We introduce 3EED, a multi-platform, multi-modal 3D grounding benchmark featuring RGB and LiDAR data from vehicle, drone, and quadruped platforms. We provide over 128,000 objec… ▽ More

    Submitted 1 December, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

    Comments: NeurIPS 2025 DB Track; 38 pages, 17 figures, 10 tables; Project Page at https://project-3eed.github.io/

  21. arXiv:2510.26796  [pdf, ps, other

    cs.CV cs.GR

    See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

    Authors: Dongyue Lu, Ao Liang, Tianxin Huang, Xiao Fu, Yuyang Zhao, Baorui Ma, Liang Pan, Wei Yin, Lingdong Kong, Wei Tsang Ooi, Ziwei Liu

    Abstract: Immersive applications call for synthesizing spatiotemporal 4D content from casual videos without costly 3D supervision. Existing video-to-4D methods typically rely on manually annotated camera poses, which are labor-intensive and brittle for in-the-wild footage. Recent warp-then-inpaint approaches mitigate the need for pose labels by warping input frames along a novel camera trajectory and using… ▽ More

    Submitted 12 March, 2026; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: Eurographics2026; 26 pages; 21 figures; 3 tables; project page: https://see-4d.github.io/

  22. arXiv:2510.16179  [pdf, ps, other

    cs.CV

    Cost Savings from Automatic Quality Assessment of Generated Images

    Authors: Xavier Giro-i-Nieto, Nefeli Andreou, Anqi Liang, Manel Baradad, Francesc Moreno-Noguer, Aleix Martinez

    Abstract: Deep generative models have shown impressive progress in recent years, making it possible to produce high quality images with a simple text prompt or a reference image. However, state of the art technology does not yet meet the quality standards offered by traditional photographic methods. For this reason, production pipelines that use generated images often include a manual stage of image quality… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

    ACM Class: I.4.9

  23. arXiv:2509.24859  [pdf, ps, other

    cs.DC

    HAPT: Heterogeneity-Aware Automated Parallel Training on Heterogeneous Clusters

    Authors: Antian Liang, Zhigang Zhao, Kai Zhang, Xuri Shi, Chuantao Li, Chunxiao Wang, Zhenying He, Yinan Jing, X. Sean Wang

    Abstract: With the rapid evolution of GPU architectures, the heterogeneity of model training infrastructures is steadily increasing. In such environments, effectively utilizing all available heterogeneous accelerators becomes critical for distributed model training. However, existing frameworks, which are primarily designed for homogeneous clusters, often exhibit significant resource underutilization when d… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  24. arXiv:2509.14396  [pdf, ps, other

    econ.TH cs.GT

    Friend or Foe: Delegating to an AI Whose Alignment is Unknown

    Authors: Drew Fudenberg, Annie Liang

    Abstract: AI systems have the potential to improve decision-making, but decision makers face the risk that the AI may be misaligned with their objectives. We study this problem in the context of a treatment decision, where a designer decides which patient attributes to reveal to an AI before receiving a prediction of the patient's need for treatment. Providing the AI with more information increases the bene… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  25. arXiv:2509.11959  [pdf, ps, other

    cs.CV cs.RO

    Learning to Generate 4D LiDAR Sequences

    Authors: Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi

    Abstract: While generative world models have advanced video and occupancy-based data synthesis, LiDAR generation remains underexplored despite its importance for accurate 3D perception. Extending generation to 4D LiDAR data introduces challenges in controllability, temporal stability, and evaluation. We present LiDARCrafter, a unified framework that converts free-form language into editable LiDAR sequences.… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: Abstract Paper (Non-Archival) @ ICCV 2025 Wild3D Workshop; GitHub Repo at https://lidarcrafter.github.io/

  26. arXiv:2509.09584  [pdf, ps, other

    cs.CV cs.RO

    Visual Grounding from Event Cameras

    Authors: Lingdong Kong, Dongyue Lu, Ao Liang, Rong Li, Yuhao Dong, Tianshuai Hu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau

    Abstract: Event cameras capture changes in brightness with microsecond precision and remain reliable under motion blur and challenging illumination, offering clear advantages for modeling highly dynamic scenes. Yet, their integration with natural language understanding has received little attention, leaving a gap in multimodal perception. To address this, we introduce Talk2Event, the first large-scale bench… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: Abstract Paper (Non-Archival) @ ICCV 2025 NeVi Workshop

  27. arXiv:2509.07996  [pdf, ps, other

    cs.CV cs.RO

    3D and 4D World Modeling: A Survey

    Authors: Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu, Dongyue Lu, Wei Yin, Xiaotao Hu, Mingkai Jia, Junyuan Deng, Kaiwen Zhang, Yang Wu, Tianyi Yan, Shenyuan Gao, Song Wang, Linfeng Li, Liang Pan, Yong Liu, Jianke Zhu, Wei Tsang Ooi, Steven C. H. Hoi, Ziwei Liu

    Abstract: World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes generative methods for 2D image and video data, they overlook the rapidly growing body of work that leverages native 3D and 4D representations such as RGB-D imagery, occupancy grids, and LiDAR point clouds for large… ▽ More

    Submitted 3 December, 2025; v1 submitted 4 September, 2025; originally announced September 2025.

    Comments: Survey; 50 pages, 10 figures, 14 tables; GitHub Repo at https://github.com/worldbench/awesome-3d-4d-world-models

  28. arXiv:2509.05878  [pdf, ps, other

    cs.CL

    MedFactEval and MedAgentBrief: A Framework and Workflow for Generating and Evaluating Factual Clinical Summaries

    Authors: François Grolleau, Emily Alsentzer, Timothy Keyes, Philip Chung, Akshay Swaminathan, Asad Aali, Jason Hom, Tridu Huynh, Thomas Lew, April S. Liang, Weihan Chu, Natasha Z. Steele, Christina F. Lin, Jingkun Yang, Kameron C. Black, Stephen P. Ma, Fateme N. Haredasht, Nigam H. Shah, Kevin Schulman, Jonathan H. Chen

    Abstract: Evaluating factual accuracy in Large Language Model (LLM)-generated clinical text is a critical barrier to adoption, as expert review is unscalable for the continuous quality assurance these systems require. We address this challenge with two complementary contributions. First, we introduce MedFactEval, a framework for scalable, fact-grounded evaluation where clinicians define high-salience key fa… ▽ More

    Submitted 6 September, 2025; originally announced September 2025.

  29. arXiv:2508.17886  [pdf, ps, other

    cs.DB

    PGTuner: An Efficient Framework for Automatic and Transferable Configuration Tuning of Proximity Graphs

    Authors: Hao Duan, Yitong Song, Bin Yao, Anqi Liang

    Abstract: Approximate Nearest Neighbor Search (ANNS) plays a crucial role in many key areas. Proximity graphs (PGs) are the leading method for ANNS, offering the best balance between query efficiency and accuracy. However, their performance heavily depends on various construction and query parameters, which are difficult to optimize due to their complex inter-dependencies. Given that users often prioritize… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  30. arXiv:2508.05977  [pdf, ps, other

    cs.LG physics.flu-dyn

    LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning

    Authors: Aoming Liang, Chi Cheng, Dashuai Chen, Boai Sun, Dixia Fan

    Abstract: In the domain of scientific machine learning, designing effective reward functions remains a challenge in reinforcement learning (RL), particularly in environments where task goals are difficult to specify numerically. Reward functions in existing work are predominantly based on heuristics, manual engineering, or task-specific tuning. In this work, we introduce a semantically aligned reinforcement… ▽ More

    Submitted 14 August, 2025; v1 submitted 7 August, 2025; originally announced August 2025.

  31. arXiv:2508.03692  [pdf, ps, other

    cs.CV cs.RO

    LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

    Authors: Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi

    Abstract: Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D L… ▽ More

    Submitted 1 December, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

    Comments: AAAI 2026 Oral Presentation; 38 pages, 18 figures, 12 tables; Project Page at https://lidarcrafter.github.io

  32. arXiv:2508.03691  [pdf, ps, other

    cs.CV cs.RO

    La La LiDAR: Large-Scale Layout Generation from LiDAR Data

    Authors: Youquan Liu, Lingdong Kong, Weidong Yang, Xin Li, Ao Liang, Runnan Chen, Ben Fei, Tongliang Liu

    Abstract: Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics. While recent diffusion-based models achieve high-fidelity LiDAR generation, they lack explicit control over foreground objects and spatial relationships, limiting their usefulness for scenario simulation and safety validation. To address these limitations, we propose Large-scale L… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: Preprint; 10 pages, 6 figures, 7 tables

  33. arXiv:2508.03690  [pdf, ps, other

    cs.CV cs.RO

    Veila: Panoramic LiDAR Generation from a Monocular RGB Image

    Authors: Youquan Liu, Lingdong Kong, Weidong Yang, Ao Liang, Jianxiong Gao, Yang Wu, Xiang Xu, Xin Li, Linfeng Li, Runnan Chen, Ben Fei

    Abstract: Realistic and controllable panoramic LiDAR data generation is critical for scalable 3D perception in autonomous driving and robotics. Existing methods either perform unconditional generation with poor controllability or adopt text-guided synthesis, which lacks fine-grained spatial control. Leveraging a monocular RGB image as a spatial control signal offers a scalable and low-cost alternative, whic… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: Preprint; 10 pages, 6 figures, 7 tables

  34. arXiv:2507.23608  [pdf, ps, other

    cs.CV cs.CR

    Medical Image De-Identification Benchmark Challenge

    Authors: Linmin Pei, Granger Sutton, Michael Rutherford, Ulrike Wagner, Tracy Nolan, Kirk Smith, Phillip Farmer, Peter Gu, Ambar Rana, Kailing Chen, Thomas Ferleman, Brian Park, Ye Wu, Jordan Kojouharov, Gargi Singh, Jon Lemon, Tyler Willis, Milos Vukadinovic, Grant Duffy, Bryan He, David Ouyang, Marco Pereanez, Daniel Samber, Derek A. Smith, Christopher Cannistraci , et al. (45 additional authors not shown)

    Abstract: The de-identification (deID) of protected health information (PHI) and personally identifiable information (PII) is a fundamental requirement for sharing medical images, particularly through public repositories, to ensure compliance with patient privacy laws. In addition, preservation of non-PHI metadata to inform and enable downstream development of imaging artificial intelligence (AI) is an impo… ▽ More

    Submitted 31 July, 2025; originally announced July 2025.

    Comments: 19 pages

  35. arXiv:2507.17665  [pdf, ps, other

    cs.CV cs.RO

    Perspective-Invariant 3D Object Detection

    Authors: Ao Liang, Lingdong Kong, Dongyue Lu, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi

    Abstract: With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multipl… ▽ More

    Submitted 5 December, 2025; v1 submitted 23 July, 2025; originally announced July 2025.

    Comments: ICCV 2025; 54 pages, 18 figures, 22 tables; Project Page at https://pi3det.github.io

  36. arXiv:2507.17664  [pdf, ps, other

    cs.CV cs.RO

    Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

    Authors: Lingdong Kong, Dongyue Lu, Ao Liang, Rong Li, Yuhao Dong, Tianshuai Hu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau

    Abstract: Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,0… ▽ More

    Submitted 3 November, 2025; v1 submitted 23 July, 2025; originally announced July 2025.

    Comments: NeurIPS 2025 Spotlight; 43 pages, 17 figures, 16 tables; Project Page at https://talk2event.github.io

  37. arXiv:2506.13558  [pdf, ps, other

    cs.CV

    X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

    Authors: Yu Yang, Alan Liang, Jianbiao Mei, Yukai Ma, Yong Liu, Gim Hee Lee

    Abstract: Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, large-scale 3D scene generation requiring spatial coherence remains underexplored. In this paper, we present X-Scene, a novel framework for large-scale driving scene generation that ach… ▽ More

    Submitted 6 December, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by NeurIPS 2025, Project page at https://x-scene.github.io/

  38. arXiv:2505.20455  [pdf, ps, other

    cs.RO

    HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval

    Authors: Matthew Hong, Anthony Liang, Kevin Kim, Harshitha Rajaprakash, Jesse Thomason, Erdem Bıyık, Jesse Zhang

    Abstract: We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion o… ▽ More

    Submitted 26 October, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  39. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 4 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  40. arXiv:2505.15151  [pdf, ps, other

    cs.LG

    Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines

    Authors: Aobo Liang, Yan Sun, Xiaohou Shi, Ke Li

    Abstract: In the past few years, time series foundation models have achieved superior predicting accuracy. However, real-world time series often exhibit significant diversity in their temporal patterns across different time spans and domains, making it challenging for a single model architecture to fit all complex scenarios. In addition, time series data may have multiple variables exhibiting complex correl… ▽ More

    Submitted 18 March, 2026; v1 submitted 21 May, 2025; originally announced May 2025.

  41. arXiv:2505.04999  [pdf, other

    cs.RO cs.AI cs.LG

    CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

    Authors: Anthony Liang, Pavel Czempin, Matthew Hong, Yutai Zhou, Erdem Biyik, Stephen Tu

    Abstract: Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that exis… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Latent Action Models, Self-supervised Pretraining, Learning from Videos

  42. arXiv:2503.24182  [pdf, other

    cs.CV

    CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization

    Authors: Yingrui Ji, Xi Xiao, Gaofei Chen, Hao Xu, Chenrui Ma, Lijing Zhu, Aokun Liang, Jiansheng Chen

    Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in cross-modal tasks such as zero-shot image classification and text-image retrieval by effectively aligning visual and textual representations. However, the theoretical foundations underlying CLIP's strong generalization remain unclear. In this work, we address this gap by proposing the Cross-modal Information Bottlenec… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  43. arXiv:2502.07822  [pdf, other

    cs.CV cs.AI

    PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation

    Authors: Ao Liang, Haiyang Hua, Jian Fang, Wenyu Chen, Huaici Zhao

    Abstract: Current Point-based detectors can only learn from the provided points, with limited receptive fields and insufficient global learning capabilities for such targets. In this paper, we present a novel Point Dilation Mechanism for single-stage 3D detection (PDM-SSD) that takes advantage of these two representations. Specifically, we first use a PointNet-style 3D backbone for efficient feature encodin… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  44. arXiv:2501.16996  [pdf, ps, other

    econ.TH cs.GT

    Artificial Intelligence Clones

    Authors: Annie Liang

    Abstract: Large language models, trained on personal data, are increasingly able to mimic individual personalities. These ``AI clones'' or ``AI agents'' have the potential to transform how people search for matches in contexts ranging from marriage to employment. This paper presents a theoretical framework to study the tradeoff between the substantially expanded search capacity of AI representations and the… ▽ More

    Submitted 8 January, 2026; v1 submitted 28 January, 2025; originally announced January 2025.

  45. arXiv:2501.02749  [pdf

    cs.RO cs.AI

    Intelligent logistics management robot path planning algorithm integrating transformer and GCN network

    Authors: Hao Luo, Jianjun Wei, Shuchen Zhao, Ankai Liang, Zhongjin Xu, Ruxue Jiang

    Abstract: This research delves into advanced route optimization for robots in smart logistics, leveraging a fusion of Transformer architectures, Graph Neural Networks (GNNs), and Generative Adversarial Networks (GANs). The approach utilizes a graph-based representation encompassing geographical data, cargo allocation, and robot dynamics, addressing both spatial and resource limitations to refine route effic… ▽ More

    Submitted 11 March, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

    Comments: 21 pages

  46. arXiv:2412.10459  [pdf, other

    cs.LG cs.AI

    Conformal Prediction on Quantifying Uncertainty of Dynamic Systems

    Authors: Aoming Liang, Qi Liu, Lei Xu, Fahad Sohrab, Weicheng Cui, Changhui Song, Moncef Gabbouj

    Abstract: Numerous studies have focused on learning and understanding the dynamics of physical systems from video data, such as spatial intelligence. Artificial intelligence requires quantitative assessments of the uncertainty of the model to ensure reliability. However, there is still a relative lack of systematic assessment of the uncertainties, particularly the uncertainties of the physical data. Our mot… ▽ More

    Submitted 17 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

  47. arXiv:2412.02448  [pdf, ps, other

    cs.DS cs.DB

    UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search

    Authors: Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, Guangxu Cheng

    Abstract: This paper presents an efficient and scalable framework for Range Filtered Approximate Nearest Neighbors Search (RF-ANNS) over high-dimensional vectors associated with attribute values. Given a query vector $q$ and a range $[l, h]$, RF-ANNS aims to find the approximate $k$ nearest neighbors of $q$ among data whose attribute values fall within $[l, h]$. Existing methods including pre-, post-, and h… ▽ More

    Submitted 17 June, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  48. arXiv:2411.06374  [pdf

    cs.IR cs.LG

    Metric Learning for Tag Recommendation: Tackling Data Sparsity and Cold Start Issues

    Authors: Yuanshuai Luo, Rui Wang, Yaxin Liang, Ankai Liang, Wenyi Liu

    Abstract: With the rapid growth of digital information, personalized recommendation systems have become an indispensable part of Internet services, especially in the fields of e-commerce, social media, and online entertainment. However, traditional collaborative filtering and content-based recommendation methods have limitations in dealing with data sparsity and cold start problems, especially in the face o… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

  49. arXiv:2411.01897  [pdf, other

    cs.LG cs.AI

    LE-PDE++: Mamba for accelerating PDEs Simulations

    Authors: Aoming Liang, Zhaoyang Mu, Qi liu, Ruipeng Li, Mingming Ge, Dixia Fan

    Abstract: Partial Differential Equations are foundational in modeling science and natural systems such as fluid dynamics and weather forecasting. The Latent Evolution of PDEs method is designed to address the computational intensity of classical and deep learning-based PDE solvers by proposing a scalable and efficient alternative. To enhance the efficiency and accuracy of LE-PDE, we incorporate the Mamba mo… ▽ More

    Submitted 12 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

  50. arXiv:2410.22649  [pdf, other

    cs.LG

    WaveRoRA: Wavelet Rotary Route Attention for Multivariate Time Series Forecasting

    Authors: Aobo Liang, Yan Sun, Nadra Guizani

    Abstract: In recent years, Transformer-based models (Transformers) have achieved significant success in multivariate time series forecasting (MTSF). However, previous works focus on extracting features either from the time domain or the frequency domain, which inadequately captures the trends and periodic characteristics. To address this issue, we propose a wavelet learning framework to model complex tempor… ▽ More

    Submitted 20 November, 2024; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: Model architecture changed