Skip to main content

Showing 1–50 of 224 results for author: Wu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.20495  [pdf, ps, other

    cs.AR

    Nebula: Enable City-Scale 3D Gaussian Splatting in Virtual Reality via Collaborative Rendering and Accelerated Stereo Rasterization

    Authors: He Zhu, Zheng Liu, Xingyang Li, Anbang Wu, Jieru Zhao, Fangxin Liu, Yiming Gan, Jingwen Leng, Yu Feng

    Abstract: 3D Gaussian splatting (3DGS) has drawn significant attention in the architectural community recently. However, current architectural designs often overlook the 3DGS scalability, making them fragile for extremely large-scale 3DGS. Meanwhile, the VR bandwidth requirement makes it impossible to deliver high-fidelity and smooth VR content from the cloud. We present Nebula, a coherent acceleration fr… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  2. arXiv:2512.20176  [pdf, ps, other

    cs.CR

    Optimistic TEE-Rollups: A Hybrid Architecture for Scalable and Verifiable Generative AI Inference on Blockchain

    Authors: Aaron Chan, Alex Ding, Frank Chen, Alan Wu, Bruce Zhang, Arther Tian

    Abstract: The rapid integration of Large Language Models (LLMs) into decentralized physical infrastructure networks (DePIN) is currently bottlenecked by the Verifiability Trilemma, which posits that a decentralized inference system cannot simultaneously achieve high computational integrity, low latency, and low cost. Existing cryptographic solutions, such as Zero-Knowledge Machine Learning (ZKML), suffer fr… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  3. arXiv:2512.19443  [pdf, ps, other

    cs.CV

    D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning

    Authors: Evelyn Zhang, Fufu Yu, Aoqi Wu, Zichen Wen, Ke Yan, Shouhong Ding, Biqing Qi, Linfeng Zhang

    Abstract: Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (MLLMs). While token pruning offers a path to acceleration, we find that current methods, while adequate for general understanding, catastrophically fail on fine-grained localization tasks. We attribute this failure to the inherent flaws of the two prevailing strategies: importance-b… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  4. arXiv:2512.16317  [pdf, ps, other

    cs.AI

    Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference

    Authors: Arther Tian, Alex Ding, Frank Chen, Alan Wu, Aaron Chan, Bruce Zhang

    Abstract: Decentralized large language model (LLM) inference promises transparent and censorship resistant access to advanced AI, yet existing verification approaches struggle to scale to modern models. Proof of Quality (PoQ) replaces cryptographic verification of computation with consensus over output quality, but the original formulation ignores heterogeneous computational costs across inference and evalu… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  5. arXiv:2512.12949  [pdf, ps, other

    cs.DC

    FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection

    Authors: Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng

    Abstract: The scaling of computation throughput continues to outpace improvements in memory bandwidth, making many deep learning workloads memory-bound. Kernel fusion is a key technique to alleviate this problem, but the fusion strategies of existing compilers and frameworks are limited to using local scratchpad memory. When the intermediate results exceed the limited capacity (such as FFN), the fusion fail… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  6. arXiv:2512.09882  [pdf, ps, other

    cs.AI cs.CR cs.CY

    Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

    Authors: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho

    Abstract: We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arb… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

  7. arXiv:2511.22780  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

    Authors: Amir Rasouli, Montgomery Alban, Sajjad Pakdamansavoji, Zhiyuan Li, Zhanguang Zhang, Aaron Wu, Xuan Zhao

    Abstract: In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psychophysical perspective, therefore we use a unified measure of clutter that accounts for environmental factors as well as the distractors quantity, characteristics, and arrangement. Using this measure, we syste… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: 12 figures, 2 tables

  8. arXiv:2511.17744  [pdf

    eess.IV cs.CV

    Robust Detection of Retinal Neovascularization in Widefield Optical Coherence Tomography

    Authors: Jinyi Hao, Jie Wang, Kotaro Tsuboi, Liqin Gao, Tristan T. Hormel, Yukun Guo, An-Lun Wu, Min Gao, Christina J. Flaxel, Steven T. Bailey, Thomas S. Hwang, Yali Jia

    Abstract: Retinal neovascularization (RNV) is a vision threatening development in diabetic retinopathy (DR). Vision loss associated with RNV is preventable with timely intervention, making RNV clinical screening and monitoring a priority. Optical coherence tomography (OCT) angiography (OCTA) provides high-resolution imaging and high-sensitivity detection of RNV lesions. With recent commercial devices introd… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

    Comments: 17 pages, 11 figures. Submitted to Optica. Corresponding author: Yali Jia. Affiliations: ((1) Casey Eye Institute, Oregon Health & Science University, USA (2) Department of Ophthalmology, Aichi Medical University, Japan (3) Department of Biomedical Engineering, Oregon Health & Science University, USA (4) Department of Ophthalmology, Mackay Memorial Hospital, Taiwan)

  9. arXiv:2511.17475  [pdf, ps, other

    physics.flu-dyn cs.LG

    Addressing A Posteriori Performance Degradation in Neural Network Subgrid Stress Models

    Authors: Andy Wu, Sanjiva K. Lele

    Abstract: Neural network subgrid stress models often have a priori performance that is far better than the a posteriori performance, leading to neural network models that look very promising a priori completely failing in a posteriori Large Eddy Simulations (LES). This performance gap can be decreased by combining two different methods, training data augmentation and reducing input complexity to the neural… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  10. arXiv:2511.13899  [pdf, ps, other

    q-bio.NC cs.CE cs.LG

    A Disentangled Low-Rank RNN Framework for Uncovering Neural Connectivity and Dynamics

    Authors: Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu

    Abstract: Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity is low-rank, it lacks disentanglement interpretations, making it difficult to assign distinct computational roles to different latent dimensions. To address this, we propose the Disentangled Recurrent Neural Net… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  11. arXiv:2511.09909  [pdf, ps, other

    cs.CV

    Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection

    Authors: Zihao Zhang, Yang Li, Aming Wu, Yahong Han

    Abstract: In this paper, we focus on Single-Domain Generalized Object Detection (Single-DGOD), aiming to transfer a detector trained on one source domain to multiple unknown domains. Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity, thereby mitigating the lack of access to target domain data. However, in real-world scenario… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  12. arXiv:2511.09564  [pdf, ps, other

    physics.chem-ph cs.AI

    Mamba-driven multi-perspective structural understanding for molecular ground-state conformation prediction

    Authors: Yuxin Gou, Aming Wu, Richang Hong, Meng Wang

    Abstract: A comprehensive understanding of molecular structures is important for the prediction of molecular ground-state conformation involving property information. Meanwhile, state space model (e.g., Mamba) has recently emerged as a promising mechanism for long sequence modeling and has achieved remarkable results in various language and vision tasks. However, towards molecular ground-state conformation… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  13. arXiv:2511.06148  [pdf, ps, other

    cs.CY cs.AI cs.CL

    Large Language Models Develop Novel Social Biases Through Adaptive Exploration

    Authors: Addison J. Wu, Ryan Liu, Xuechunzi Bai, Thomas L. Griffiths

    Abstract: As large language models (LLMs) are adopted into frameworks that grant them the capacity to make real decisions, it is increasingly important to ensure that they are unbiased. In this paper, we argue that the predominant approach of simply removing existing biases from models is not enough. Using a paradigm from the psychology literature, we demonstrate that LLMs can spontaneously develop novel so… ▽ More

    Submitted 22 December, 2025; v1 submitted 8 November, 2025; originally announced November 2025.

  14. arXiv:2510.26996  [pdf, ps, other

    cs.CV

    MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation

    Authors: Arghavan Rezvani, Xiangyi Yan, Anthony T. Wu, Kun Han, Pooya Khosravi, Xiaohui Xie

    Abstract: In this study, we propose MoME, a Mixture of Visual Language Medical Experts, for Medical Image Segmentation. MoME adapts the successful Mixture of Experts (MoE) paradigm, widely used in Large Language Models (LLMs), for medical vision-language tasks. The architecture enables dynamic expert selection by effectively utilizing multi-scale visual features tailored to the intricacies of medical imager… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

  15. arXiv:2510.26231  [pdf

    cs.IR

    DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds

    Authors: Haochen Chen, Qi Huang, Anan Wu, Wenhao Zhang, Jianliang Ye, Jianming Wu, Kai Tan, Xin Lu, Xin Xu

    Abstract: Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates mul… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

  16. arXiv:2510.26114  [pdf, ps, other

    cs.CV

    OracleAgent: A Multimodal Reasoning Agent for Oracle Bone Script Research

    Authors: Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, Xu Peng, Taisong Jin, Yongge Liu, Shengwei Han, Jing Yang, Xiaoping He, Feng Gao, AndyPian Wu, SevenShu, Chaoyang Wang, Chengjie Wang

    Abstract: As one of the earliest writing systems, Oracle Bone Script (OBS) preserves the cultural and intellectual heritage of ancient civilizations. However, current OBS research faces two major challenges: (1) the interpretation of OBS involves a complex workflow comprising multiple serial and parallel sub-tasks, and (2) the efficiency of OBS information organization and retrieval remains a critical bottl… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  17. arXiv:2510.24152  [pdf, ps, other

    cs.CV cs.AI

    Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning

    Authors: Aodi Wu, Xubo Luo

    Abstract: This technical report presents our solution for the RoboSense Challenge at IROS 2025, which evaluates Vision-Language Models (VLMs) on autonomous driving scene understanding across perception, prediction, planning, and corruption detection tasks. We propose a systematic framework built on four core components. First, a Mixture-of-Prompts router classifies questions and dispatches them to task-spec… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: RoboSense Challenge with IROS 2025

  18. arXiv:2510.19687  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Are Large Language Models Sensitive to the Motives Behind Communication?

    Authors: Addison J. Wu, Ryan Liu, Kerem Oktar, Theodore R. Sumers, Thomas L. Griffiths

    Abstract: Human communication is motivated: people speak, write, and create content with a particular communicative intent in mind. As a result, information that large language models (LLMs) and AI agents process is inherently framed by humans' intentions and incentives. People are adept at navigating such nuanced information: we routinely identify benevolent or self-serving motives in order to decide what… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: NeurIPS 2025

  19. arXiv:2510.13307  [pdf, ps, other

    cs.CV

    Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning

    Authors: Yang Li, Aming Wu, Zihao Zhang, Yahong Han

    Abstract: In this paper, we focus on Novel Class Discovery for Point Cloud Segmentation (3D-NCD), aiming to learn a model that can segment unlabeled (novel) 3D classes using only the supervision from labeled (base) 3D classes. The key to this task is to setup the exact correlations between the point representations and their base class labels, as well as the representation correlations between the points fr… ▽ More

    Submitted 22 October, 2025; v1 submitted 15 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  20. arXiv:2510.03330  [pdf, ps, other

    cs.LG

    Constant in an Ever-Changing World

    Authors: Andy Wu, Chun-Cheng Lin, Yuehua Huang, Rung-Tzuo Liaw

    Abstract: The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC s… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

    Comments: in Chinese language

  21. arXiv:2510.03013  [pdf, ps, other

    cs.LG

    Distributional Inverse Reinforcement Learning

    Authors: Feiyang Wu, Ye Zhao, Anqi Wu

    Abstract: We propose a distributional framework for offline Inverse Reinforcement Learning (IRL) that jointly models uncertainty over reward functions and full distributions of returns. Unlike conventional IRL approaches that recover a deterministic reward estimate or match only expected returns, our method captures richer structure in expert behavior, particularly in learning the reward distribution, by mi… ▽ More

    Submitted 6 October, 2025; v1 submitted 3 October, 2025; originally announced October 2025.

  22. arXiv:2510.02182  [pdf, ps, other

    q-bio.NC cs.CV cs.LG

    Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion

    Authors: Yule Wang, Joseph Yu, Chengrui Li, Weihan Li, Anqi Wu

    Abstract: Understanding how neural populations in higher visual areas encode object-centered visual information remains a central challenge in computational neuroscience. Prior works have investigated representational alignment between artificial neural networks and the visual cortex. Nevertheless, these findings are indirect and offer limited insights to the structure of neural populations themselves. Simi… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  23. arXiv:2510.01083  [pdf, ps, other

    cs.LG

    Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method

    Authors: Andy Wu, Chun-Cheng Lin, Rung-Tzuo Liaw, Yuehua Huang, Chihjung Kuo, Chia Tong Weng

    Abstract: Reinforcement learning has gathered much attention in recent years due to its rapid development and rich applications, especially on control systems and robotics. When tackling real-world applications with reinforcement learning method, the corresponded Markov decision process may have huge discrete or even continuous state/action space. Deep reinforcement learning has been studied for handling th… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

  24. arXiv:2509.25685  [pdf, ps, other

    cs.RO

    Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors

    Authors: Amelie Minji Kim, Anqi Wu, Ye Zhao

    Abstract: We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associate… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  25. arXiv:2509.08149  [pdf, ps, other

    physics.med-ph cs.SE physics.app-ph

    The-Bodega: A Matlab Toolbox for Biologically Dynamic Microbubble Simulations on Realistic Hemodynamic Microvascular Graphs

    Authors: Stephen Alexander Lee, Alexis Leconte, Alice Wu, Jonathan Poree, Maxence Laplante-Berthier, Simon Desrocher, Pierre-Olivier Bouchard, Joshua Kinugasa, Samuel Mihelic, Andreas Linninger, Jean Provost

    Abstract: The-Bodega is a Matlab-based toolbox for simulating ground-truth datasets for Ultrasound Localization Microscopy (ULM)-a super resolution imaging technique that resolves microvessels by systematically tracking microbubbles flowing through the microvasculature. The-Bodega enables open-source simulation of stochastic microbubble dynamics through anatomically complex vascular graphs and features a qu… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: 36 Pages, 12 Figures

  26. arXiv:2509.07455  [pdf, ps, other

    cs.CV

    XOCT: Enhancing OCT to OCTA Translation via Cross-Dimensional Supervised Multi-Scale Feature Learning

    Authors: Pooya Khosravi, Kun Han, Anthony T. Wu, Arghavan Rezvani, Zexin Feng, Xiaohui Xie

    Abstract: Optical Coherence Tomography Angiography (OCTA) and its derived en-face projections provide high-resolution visualization of the retinal and choroidal vasculature, which is critical for the rapid and accurate diagnosis of retinal diseases. However, acquiring high-quality OCTA images is challenging due to motion sensitivity and the high costs associated with software modifications for conventional… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: 11 pages, 3 figures, Accepted to MICCAI 2025

    ACM Class: J.3

  27. arXiv:2508.07917  [pdf, ps, other

    cs.RO

    MolmoAct: Action Reasoning Models that can Reason in Space

    Authors: Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna

    Abstract: Reasoning is central to purposeful action, yet most robotic foundation models map perception and instructions directly to control, which limits adaptability, generalization, and semantic grounding. We introduce Action Reasoning Models (ARMs), a class of robotic foundation models that integrate perception, planning, and control through a structured three-stage pipeline. Our model, MolmoAct, encodes… ▽ More

    Submitted 18 September, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

    Comments: Updated GR00T result to N1.5

  28. arXiv:2508.03077  [pdf, ps, other

    cs.CV

    RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions

    Authors: Anran Wu, Long Peng, Xin Di, Xueyuan Dai, Chen Wu, Yang Wang, Xueyang Fu, Yang Cao, Zheng-Jun Zha

    Abstract: Feedforward 3D Gaussian Splatting (3DGS) overcomes the limitations of optimization-based 3DGS by enabling fast and high-quality reconstruction without the need for per-scene optimization. However, existing feedforward approaches typically assume that input multi-view images are clean and high-quality. In real-world scenarios, images are often captured under challenging conditions such as noise, lo… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

  29. arXiv:2507.17001  [pdf, ps, other

    cs.LG

    Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation

    Authors: Yan Li, Guangyi Chen, Yunlong Deng, Zijian Li, Zeyu Tang, Anpeng Wu, Kun Zhang

    Abstract: Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased fe… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  30. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 19 December, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  31. arXiv:2507.03310  [pdf, ps, other

    cs.LG cs.AI

    ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series

    Authors: Weihong Li, Anpeng Wu, Kun Kuang, Keting Yin

    Abstract: This paper studies causal discovery in irregularly sampled time series-a pivotal challenge in high-stakes domains like finance, healthcare, and climate science, where missing data and inconsistent sampling frequencies distort causal mechanisms. Traditional methods (e.g., Granger causality, PCMCI) fail to reconcile multi-scale interactions (e.g., hourly storms vs. decadal climate shifts), while neu… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 12 pages, 2 figures

  32. arXiv:2507.02309  [pdf, ps, other

    cs.CR

    Rethinking Broken Object Level Authorization Attacks Under Zero Trust Principle

    Authors: Anbin Wu, Zhiyong Feng, Ruitao Feng, Zhenchang Xing, Yang Liu

    Abstract: RESTful APIs facilitate data exchange between applications, but they also expose sensitive resources to potential exploitation. Broken Object Level Authorization (BOLA) is the top vulnerability in the OWASP API Security Top 10, exemplifies a critical access control flaw where attackers manipulate API parameters to gain unauthorized access. To address this, we propose BOLAZ, a defense framework gro… ▽ More

    Submitted 14 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

  33. arXiv:2506.24063  [pdf, ps, other

    cs.CV

    Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios

    Authors: Deng Li, Aming Wu, Yang Li, Yaowei Wang, Yahong Han

    Abstract: In practice, environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption, i.e., training and test data share the same distribution. To this end, continual test-time adaptation has attracted much attention, aiming to improve detectors' generalization by fine-tuning a few specific parameters, e.g., BatchNorm layers.… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  34. arXiv:2506.21463  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Aligning Spoken Dialogue Models from User Interactions

    Authors: Anne Wu, Laurent Mazaré, Neil Zeghidour, Alexandre Défossez

    Abstract: We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speak… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICML 2025

  35. arXiv:2506.21101  [pdf, ps, other

    cs.CV

    OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

    Authors: Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, SevenShu, Yunsheng Wu, Yongge Liu, Rongrong Ji

    Abstract: As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations. Despite the discovery of approximately 4,500 OBS characters, only about 1,600 have been deciphered. The remaining undeciphered ones, with their complex structure and abstract imagery, pose significant challenges for interpretation. To address t… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  36. arXiv:2506.15314  [pdf

    cs.HC

    Case Study for Developing a UXR Point of View for FinOps Product Innovation

    Authors: Jason Dong, Anna Wu

    Abstract: In the dynamic landscape of Cloud financial management, we are sharing a case study exploring the development of a User Experience Research (UXR) Point of View (PoV) to drive FinOps product innovation. We demonstrate how qualitative and quantitative research methods working together to navigate the challenges of understanding customer needs, aligning cross-functional teams, and prioritizing limite… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  37. arXiv:2506.15190  [pdf, ps, other

    cs.LG q-bio.NC

    Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior

    Authors: Jiyi Wang, Jingyang Ke, Bo Dai, Anqi Wu

    Abstract: Animals flexibly recombine a finite set of core motor motifs to meet diverse task demands, but existing behavior segmentation methods oversimplify this process by imposing discrete syllables under restrictive generative assumptions. To better capture the continuous structure of behavior generation, we introduce motif-based continuous dynamics (MCD) discovery, a framework that (1) uncovers interpre… ▽ More

    Submitted 2 October, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: 9 pages and 4 figures for the main text

  38. arXiv:2506.14146  [pdf, ps, other

    cs.AI

    Collaborative Editable Model

    Authors: Kaiwen Tang, Aitong Wu, Yao Lu, Guangda Sun

    Abstract: Vertical-domain large language models (LLMs) play a crucial role in specialized scenarios such as finance, healthcare, and law; however, their training often relies on large-scale annotated data and substantial computational resources, impeding rapid development and continuous iteration. To address these challenges, we introduce the Collaborative Editable Model (CoEM), which constructs a candidate… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  39. arXiv:2505.22861  [pdf, ps, other

    cs.LG

    Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel

    Authors: Carlota Parés-Morlans, Michelle Yi, Claire Chen, Sarah A. Wu, Rika Antonova, Tobias Gerstenberg, Jeannette Bohg

    Abstract: Tasks that involve complex interactions between objects with unknown dynamics make planning before execution difficult. These tasks require agents to iteratively improve their actions after actively exploring causes and effects in the environment. For these type of tasks, we propose Causal-PIK, a method that leverages Bayesian optimization to reason about causal interactions via a Physics-Informed… ▽ More

    Submitted 30 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  40. arXiv:2505.19699  [pdf, ps, other

    cs.LG cs.AI cs.DC

    Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

    Authors: Junming Liu, Yanting Gao, Siyuan Meng, Yifei Sun, Aoqi Wu, Yufei Jin, Yirong Chen, Ding Wang, Guosun Zeng

    Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosai… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 43 pages, 23 figures, 15 tables; the last dance

  41. arXiv:2505.10855  [pdf

    eess.IV cs.CV

    Generalizable cardiac substructures segmentation from contrast and non-contrast CTs using pretrained transformers

    Authors: Aneesh Rangnekar, Nikhil Mankuzhy, Jonas Willmann, Chloe Choi, Abraham Wu, Maria Thor, Andreas Rimner, Harini Veeraraghavan

    Abstract: Automated AI segmentations for radiation treatment planning deteriorate when applied to cases with different characteristics than the training dataset. We developed a hybrid transformer convolutional network to segment cardiac substructures in lung and breast cancer patients with varying imaging contrasts and scan positions. Cohort I (56 contrast-enhanced CT [CECT], 124 non-contrast CT [NCCT] scan… ▽ More

    Submitted 26 November, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  42. arXiv:2505.09113  [pdf, other

    cs.LG stat.ME

    Sequential Treatment Effect Estimation with Unmeasured Confounders

    Authors: Yingrong Wang, Anpeng Wu, Baohong Li, Ziyang Xiao, Ruoxuan Xiong, Qing Han, Kun Kuang

    Abstract: This paper studies the cumulative causal effects of sequential treatments in the presence of unmeasured confounders. It is a critical issue in sequential decision-making scenarios where treatment decisions and outcomes dynamically evolve over time. Advanced causal methods apply transformer as a backbone to model such time sequences, which shows superiority in capturing long time dependence and per… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  43. arXiv:2504.08937  [pdf, ps, other

    cs.GR cs.CV cs.LG eess.IV stat.ML

    Rethinking Few-Shot Image Fusion: Granular Ball Priors Enable General-Purpose Deep Fusion

    Authors: Minjie Deng, Yan Wei, An Wu, Yuncan Ouyang, Hao Zhai, Qianyao Peng

    Abstract: In image fusion tasks, the absence of real fused images as priors forces most deep learning approaches to rely on large-scale paired datasets to extract global weighting features or to generate pseudo-supervised images through algorithmic constructions. Unlike previous methods, this work re-examines prior-guided learning under few-shot conditions by introducing rough set theory. We regard the trad… ▽ More

    Submitted 9 December, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  44. arXiv:2504.03964  [pdf, other

    cs.CL cs.AI cs.LG

    Clinical ModernBERT: An efficient and long context encoder for biomedical text

    Authors: Simon A. Lee, Anthony Wu, Jeffrey N. Chiang

    Abstract: We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional e… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Manuscript writeup corresponding to the Clinical ModernBERT pre-trained encoder (https://huggingface.co/Simonlee711/Clinical_ModernBERT)

  45. arXiv:2504.02196  [pdf, other

    physics.ins-det astro-ph.IM cs.LG

    Orbit Determination through Cosmic Microwave Background Radiation

    Authors: Pedro K de Albuquerque, Andre R Kuroswiski, Annie S. Wu, Willer G. dos Santos, Paulo Costa

    Abstract: This research explores the use of Cosmic Microwave Background (CMB) radiation as a reference signal for Initial Orbit Determination (IOD). By leveraging the unique properties of CMB, this study introduces a novel method for estimating spacecraft velocity and position with minimal reliance on pre-existing environmental data, offering significant advantages for space missions independent of Earth-sp… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: This paper was presented at the 2024 AAS/AIAA Astrodynamics Specialist Conference, August 11-15, 2024, Broomfield, Colorado, USA

  46. arXiv:2503.22745   

    cs.LG stat.ML

    Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling

    Authors: Tom Liu, Anna Wu, Chao Li

    Abstract: Self-training has become a popular semi-supervised learning technique for leveraging unlabeled data. However, the over-confidence of pseudo-labels remains a key challenge. In this paper, we propose a novel \emph{graph-based uncertainty-aware self-training} (GUST) framework to combat over-confidence in node classification. Drawing inspiration from the uncertainty integration idea introduced by Wang… ▽ More

    Submitted 29 July, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: arXiv admin note: This paper has been withdrawn by arXiv due to disputed and unverifiable authorship and affiliation

  47. Progressive Human Motion Generation Based on Text and Few Motion Frames

    Authors: Ling-An Zeng, Gaojie Wu, Ancong Wu, Jian-Fang Hu, Wei-Shi Zheng

    Abstract: Although existing text-to-motion (T2M) methods can produce realistic human motion from text description, it is still difficult to align the generated motion with the desired postures since using text alone is insufficient for precisely describing diverse postures. To achieve more controllable generation, an intuitive way is to allow the user to input a few motion frames describing precise desired… ▽ More

    Submitted 30 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025

  48. arXiv:2503.12538  [pdf, other

    cs.RO cs.LG

    EmoBipedNav: Emotion-aware Social Navigation for Bipedal Robots with Deep Reinforcement Learning

    Authors: Wei Zhu, Abirath Raju, Abdulaziz Shamsah, Anqi Wu, Seth Hutchinson, Ye Zhao

    Abstract: This study presents an emotion-aware navigation framework -- EmoBipedNav -- using deep reinforcement learning (DRL) for bipedal robots walking in socially interactive environments. The inherent locomotion constraints of bipedal robots challenge their safe maneuvering capabilities in dynamic environments. When combined with the intricacies of social environments, including pedestrian interactions a… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 13 pages

  49. arXiv:2503.09968  [pdf, other

    cs.CV

    Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection

    Authors: Zihao Zhang, Aming Wu, Yahong Han

    Abstract: Recently, a task of Single-Domain Generalized Object Detection (Single-DGOD) is proposed, aiming to generalize a detector to multiple unknown domains never seen before during training. Due to the unavailability of target-domain data, some methods leverage the multimodal capabilities of vision-language models, using textual prompts to estimate cross-domain information, enhancing the model's general… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  50. arXiv:2503.06617  [pdf, other

    cs.CV

    Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling

    Authors: Long Peng, Anran Wu, Wenbo Li, Peizhe Xia, Xueyuan Dai, Xinjie Zhang, Xin Di, Haoze Sun, Renjing Pei, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors using a single model, addressing the limitations of traditional SR methods constrained to fixed-scale factors (\textit{e.g.}, $\times$ 2). Recent advances leveraging implicit neural representation (INR) have achieved great progress by modeling co… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: Tech Report