Skip to main content

Showing 1–50 of 480 results for author: Chu, W

.
  1. arXiv:2604.10095  [pdf, ps, other

    cs.CV

    Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models

    Authors: Yu Jiang, Hanwen Jiang, Ahmed Abdelkader, Wen-Sheng Chu, Brandon Y. Feng, Zhangyang Wang, Qixing Huang

    Abstract: With the emergence of 3D foundation models, there is growing interest in fine-tuning them for downstream tasks, where LoRA is the dominant fine-tuning paradigm. As 3D datasets exhibit distinct variations in texture, geometry, camera motion, and lighting, there are interesting fundamental questions: 1) Are there LoRA subspaces associated with each type of variation? 2) Are these subspaces disentang… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: 10 pages, 8 figures

  2. arXiv:2603.28117  [pdf, ps, other

    cs.LG

    Neural Federated Learning for Livestock Growth Prediction

    Authors: Shoujin Wang, Mingze Ni, Wei Liu, Victor W. Chu, Bryan Zheng, Ayush Kanwal, Roy Jing Yang, Kenneth Sabir, Fang Chen

    Abstract: Livestock growth prediction is essential for optimising farm management and improving the efficiency and sustainability of livestock production, yet it remains underexplored due to limited large-scale datasets and privacy concerns surrounding farm-level data. Existing biophysical models rely on fixed formulations, while most machine learning approaches are trained on small, isolated datasets, limi… ▽ More

    Submitted 1 April, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

    Comments: Accepted by WCCI 2026 (IJCNN 2026)

  3. arXiv:2603.25913  [pdf, ps, other

    math.CO

    Summation Formulae for Binomial Moments

    Authors: Marta Na Chen, Wenchang Chu

    Abstract: By combining the telescoping method with an algebraic relation, four classes of binomial moments are examined. Several explicit summation formulae are established.

    Submitted 26 March, 2026; originally announced March 2026.

    MSC Class: 05A10; 11B65

  4. arXiv:2603.21111  [pdf, ps, other

    cs.CV cs.LG

    Frequency Switching Mechanism for Parameter-E!cient Multi-Task Learning

    Authors: Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-cost weight modulation via frequency switching (\textbf{Free}). Specifically, a \… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026

  5. Multiscale Switch for Semi-Supervised and Contrastive Learning in Medical Ultrasound Image Segmentation

    Authors: Jingguo Qu, Xinyang Han, Yao Pu, Man-Lik Chui, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

    Abstract: Medical ultrasound image segmentation faces significant challenges due to limited labeled data and characteristic imaging artifacts including speckle noise and low-contrast boundaries. While semi-supervised learning (SSL) approaches have emerged to address data scarcity, existing methods suffer from suboptimal unlabeled data utilization and lack robust feature representation mechanisms. In this pa… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: This is the author-submitted LaTeX version with original typesetting. The final published version (with IEEE production formatting and layout changes) is available at http://doi.org/10.1109/TNNLS.2026.3669814 under CC BY 4.0 license

  6. arXiv:2603.17846  [pdf

    cond-mat.supr-con

    Pressure-induced Superconductivity in AgSbTe2

    Authors: Sudaice Kazibwe, Bishnu Karki, Wencheng Lu, Zhongxin Liang, Minghong Sui, Melissa Gooch, Zhifeng Ren, Pavan Hosur, Timothy A. Strobel, Ching-Wu Chu, Liangzi Deng

    Abstract: AgSbTe2 is a well-known thermoelectric material with a high Seebeck coefficient and intrinsically low thermal conductivity, but its behavior under pressure remains largely unexplored. Here we report a systematic investigation of the structural, electronic, and transport properties of non-stoichiometric AgSbTe2 under high pressure. At ambient pressure, the material can be described as having a cubi… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

    Comments: 28 pages, 5 figures, 8 Supplementary information figures

  7. arXiv:2603.16943  [pdf

    cs.CV cs.AI

    KGS-GCN: Enhancing Sparse Skeleton Sensing via Kinematics-Driven Gaussian Splatting and Probabilistic Topology for Action Recognition

    Authors: Yuhan Chen, Yicui Shi, Guofa Li, Liping Zhang, Jie Li, Jiaxin Gao, Wenbo Chu

    Abstract: Skeleton-based action recognition is widely utilized in sensor systems including human-computer interaction and intelligent surveillance. Nevertheless, current sensor devices typically generate sparse skeleton data as discrete coordinates, which inevitably discards fine-grained spatiotemporal details during highly dynamic movements. Moreover, the rigid constraints of predefined physical sensor top… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  8. arXiv:2603.16429  [pdf, ps, other

    astro-ph.IM cs.AI cs.CV

    LenghuSky-8: An 8-Year All-Sky Cloud Dataset with Star-Aware Masks and Alt-Az Calibration for Segmentation and Nowcasting

    Authors: Yicheng Rui, Xiao-Wei Duan, Licai Deng, Fan Yang, Zhengming Dang, Zhengjun Du, Junhao Peng, Wenhao Chu, Umut Mahmut, Kexin Li, Yiyun Wu, Fabo Feng

    Abstract: Ground-based time-domain observatories require minute-by-minute, site-scale awareness of cloud cover, yet existing all-sky datasets are short, daylight-biased, or lack astrometric calibration. We present LenghuSky-8, an eight-year (2018-2025) all-sky imaging dataset from a premier astronomical site, comprising 429,620 $512 \times 512$ frames with 81.2% night-time coverage, star-aware cloud masks,… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: CVPR Findings accepted. 20 pages, 8 figures

    ACM Class: I.4.6; I.2.10

  9. arXiv:2603.16157  [pdf, ps, other

    cs.LG cs.AI

    DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay

    Authors: Long Li, Zhijian Zhou, Tianyi Wang, Weidi Xu, Zuming Huang, Wei Chu, Zhe Wang, Shirui Pan, Chao Qu, Yuan Qi

    Abstract: While Reinforcement Learning (RL) enhances Large Language Model reasoning, on-policy algorithms like GRPO are sample-inefficient as they discard past rollouts. Existing experience replay methods address this by reusing accurate samples for direct policy updates, but this often incurs high computational costs and causes mode collapse via overfitting. We argue that historical data should prioritize… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 14 pages, 3 figures

  10. arXiv:2603.12437  [pdf

    cond-mat.supr-con

    Ambient-pressure 151-K superconductivity in HgBa2Ca2Cu3O8+δ via pressure quench

    Authors: Liangzi Deng, Thacien Habamahoro, Artin Safezoddeh, Bishnu Karki, Sudaice Kazibwe, Daniel J. Schulze, Zheng Wu, Matthew Julian, Rohit P. Prasankumar, Hua Zhou, Jesse S. Smith, Pavan R. Hosur, Ching-Wu Chu

    Abstract: Superconductivity has been a vigorously researched topic since its discovery in 1911. Raising the superconducting transition temperature (Tc) has been the main driving force behind such long-sustained efforts due to its potential for impacting humanity and the fundamental knowledge gained from understanding this macroscopic coherent quantum state at high temperatures. The successful development of… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

    Comments: 27 pages, 4 manuscript figures, 8 supporting information figures

    Journal ref: Proceedings of the National Academy of Sciences USA 123, e2536178123 (2026)

  11. arXiv:2603.10833  [pdf, ps, other

    cs.CV

    Evaluating Few-Shot Pill Recognition Under Visual Domain Shift

    Authors: W. I. Chu, G. Tarroni, L. Li

    Abstract: Adverse drug events are a significant source of preventable harm, which has led to the development of automated pill recognition systems to enhance medication safety. Real-world deployment of these systems is hindered by visually complex conditions, including cluttered scenes, overlapping pills, reflections, and diverse acquisition environments. This study investigates few-shot pill recognition fr… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 8 pages, 4 figures. Submitted to IEEE Engineering in Medicine and Biology Conference (EMBC) 2026

  12. arXiv:2603.10825  [pdf, ps, other

    cs.CV

    A dataset of medication images with instance segmentation masks for preventing adverse drug events

    Authors: W. I. Chu, S. Hirani, G. Tarroni, L. Li

    Abstract: Medication errors and adverse drug events (ADEs) pose significant risks to patient safety, often arising from difficulties in reliably identifying pharmaceuticals in real-world settings. AI-based pill recognition models offer a promising solution, but the lack of comprehensive datasets hinders their development. Existing pill image datasets rarely capture real-world complexities such as overlappin… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 25 pages, 19 figures. Submitted to Scientific Data (Nature Portfolio)

  13. arXiv:2602.18906  [pdf, ps, other

    cs.CV

    Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

    Authors: Shengjie Zhu, Ahmed Abdelkader, Mark J. Matthews, Xiaoming Liu, Wen-Sheng Chu

    Abstract: Structure-from-Motion (SfM) is a fundamental 3D vision task for recovering camera parameters and scene geometry from multi-view images. While recent deep learning advances enable accurate Monocular Depth Estimation (MDE) from single images without depending on camera motion, integrating MDE into SfM remains a challenge. Unlike conventional triangulated sparse point clouds, MDE produces dense depth… ▽ More

    Submitted 21 February, 2026; originally announced February 2026.

  14. arXiv:2602.03654  [pdf, ps, other

    nlin.AO math.NA physics.soc-ph

    Noisy nonlocal aggregation model with gradient flow structures

    Authors: Su Yang, Weiqi Chu, Panayotis G. Kevrekidis

    Abstract: Interacting particle systems provide a fundamental framework for modeling collective behavior in biological, social, and physical systems. In many applications, stochastic perturbations are essential for capturing environmental variability and individual uncertainty, yet their impact on long-term dynamics and equilibrium structure remains incompletely understood, particularly in the presence of no… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: 15 pages; 4 figures

  15. arXiv:2601.19240  [pdf, ps, other

    cond-mat.mtrl-sci physics.comp-ph

    Symmetry Adapted Analysis of Screw Dislocation: Electronic Structure and Carrier Recombination Mechanisms in GaN

    Authors: Yuncheng Xie, Haozhe Shi, Menglin Huang, Weibin Chu, Shiyou Chen, Xin-Gao Gong

    Abstract: As fundamental one-dimensional defects, screw dislocations profoundly reshape the energy landscape and carrier dynamics of crystalline materials. By restoring the exact algebra of the screw dislocation group, we unveil the latent symmetry constraints that govern the electronic structure, providing a more rigorous physical picture than the conventional treatments. When applied to GaN, the method yi… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

    Comments: 9 pages, 8 figures

  16. arXiv:2601.15772  [pdf

    cs.CV

    LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting

    Authors: Yuhan Chen, Wenxuan Yu, Guofa Li, Yijun Xu, Ying Fang, Yicui Shi, Long Cao, Wenbo Chu, Keqiang Li

    Abstract: 2D Gaussian Splatting (2DGS) is an emerging explicit scene representation method with significant potential for image compression due to high fidelity and high compression ratios. However, existing low-light enhancement algorithms operate predominantly within the pixel domain. Processing 2DGS-compressed images necessitates a cumbersome decompression-enhancement-recompression pipeline, which compro… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  17. arXiv:2601.15766  [pdf

    cs.CV

    LL-GaussianMap: Zero-shot Low-Light Image Enhancement via 2D Gaussian Splatting Guided Gain Maps

    Authors: Yuhan Chen, Ying Fang, Guofa Li, Wenxuan Yu, Yicui Shi, Jingrui Zhang, Kefei Qian, Wenbo Chu, Keqiang Li

    Abstract: Significant progress has been made in low-light image enhancement with respect to visual quality. However, most existing methods primarily operate in the pixel domain or rely on implicit feature representations. As a result, the intrinsic geometric structural priors of images are often neglected. 2D Gaussian Splatting (2DGS) has emerged as a prominent explicit scene representation technique charac… ▽ More

    Submitted 27 January, 2026; v1 submitted 22 January, 2026; originally announced January 2026.

  18. arXiv:2601.01726  [pdf, ps, other

    cs.RO eess.SY

    Simulations and Advancements in MRI-Guided Power-Driven Ferric Tools for Wireless Therapeutic Interventions

    Authors: Wenhui Chu, Aobo Jin, Hardik A. Gohel

    Abstract: Designing a robotic system that functions effectively within the specific environment of a Magnetic Resonance Imaging (MRI) scanner requires solving numerous technical issues, such as maintaining the robot's precision and stability under strong magnetic fields. This research focuses on enhancing MRI's role in medical imaging, especially in its application to guide intravascular interventions using… ▽ More

    Submitted 4 January, 2026; originally announced January 2026.

    Comments: 10 pages, 7 figures

    ACM Class: J.3; I.2.9

  19. arXiv:2601.01512  [pdf, ps, other

    cs.CV cs.LG

    A Novel Deep Learning Method for Segmenting the Left Ventricle in Cardiac Cine MRI

    Authors: Wenhui Chu, Aobo Jin, Hardik A. Gohel

    Abstract: This research aims to develop a novel deep learning network, GBU-Net, utilizing a group-batch-normalized U-Net framework, specifically designed for the precise semantic segmentation of the left ventricle in short-axis cine MRI scans. The methodology includes a down-sampling pathway for feature extraction and an up-sampling pathway for detail restoration, enhanced for medical imaging. Key modificat… ▽ More

    Submitted 4 January, 2026; originally announced January 2026.

    Comments: 9 pages, 5 figures

    ACM Class: I.4.6; I.2.10

  20. arXiv:2601.01087  [pdf, ps, other

    cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el

    Family of High-Chern-Number Orbital Magnets in Twisted Rhombohedral Graphene

    Authors: Xirui Wang, L. Antonio Benítez, Vo Tien Phong, Wai In Chu, Kenji Watanabe, Takashi Taniguchi, Cyprian Lewandowski, Pablo Jarillo-Herrero

    Abstract: Realizing Chern insulators with Chern numbers greater than one remains a major goal in quantum materials research. Such platforms promise multichannel dissipationless chiral transport and access to correlated phases beyond the conventional C = 1 paradigm. Here, we discover a family of high-Chern-number orbital magnets in twisted monolayer-multilayer rhombohedral graphene, denoted (1+n) with n = 3,… ▽ More

    Submitted 3 January, 2026; originally announced January 2026.

  21. arXiv:2601.00981  [pdf, ps, other

    cs.RO cs.CV eess.SY

    Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions

    Authors: Wenhui Chu, Khang Tran, Nikolaos V. Tsekos

    Abstract: Magnetic Resonance Imaging (MRI) is a well-established modality for pre-operative planning and is also explored for intra-operative guidance of procedures such as intravascular interventions. Among the experimental robot-assisted technologies, the magnetic field gradients of the MRI scanner are used to power and maneuver ferromagnetic applicators for accessing sites in the patient's body via the v… ▽ More

    Submitted 2 January, 2026; originally announced January 2026.

    Comments: 9 pages, 8 figures, published in ICBBB 2022

    Journal ref: 2022 12th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB '22), January 7-10, 2022, Tokyo, Japan

  22. Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

    Authors: Wenhui Chu, Nikolaos V. Tsekos

    Abstract: Left ventricle (LV) segmentation is critical for clinical quantification and diagnosis of cardiac images. In this work, we propose two novel deep learning architectures called LNU-Net and IBU-Net for left ventricle segmentation from short-axis cine MRI images. LNU-Net is derived from layer normalization (LN) U-Net architecture, while IBU-Net is derived from the instance-batch normalized (IB) U-Net… ▽ More

    Submitted 2 January, 2026; originally announced January 2026.

    Comments: 7 pages, 5 figures, published in ICBBB 2022

    Journal ref: 2022 12th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB '22), January 7-10, 2022, Tokyo, Japan

  23. arXiv:2512.16921  [pdf, ps, other

    cs.CV cs.AI

    Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

    Authors: Qihao Liu, Chengzhi Mao, Yaojie Liu, Alan Yuille, Wen-Sheng Chu

    Abstract: Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps across models. To address this, we introduce AuditDM, an automated framework that actively discovers and rectifies MLLM failure modes by auditing their divergence. AuditDM fine-tunes an MLLM as an auditor via reinforcement learning to generate c… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: project page: https://auditdm.github.io/

  24. arXiv:2512.15550  [pdf, ps, other

    cs.CL

    CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing

    Authors: Kuan Lu, Shuhang Lin, Sai Wu, Yichen Yao, Junhan Yang, Huan Li, Wei Chu, Xu Yinghui, Yuan Qi, Gang Chen

    Abstract: Large language models (LLMs) are increasingly applied in long-context scenarios such as multi-turn conversations. However, long contexts pose significant challenges for inference efficiency, including high memory overhead from Key-Value (KV) cache and increased latency due to excessive memory accesses. Recent methods for dynamic KV selection struggle with trade-offs: block-level indexing degrades… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  25. arXiv:2512.10941  [pdf, ps, other

    cs.CV cs.AI

    Mull-Tokens: Modality-Agnostic Latent Thinking

    Authors: Arijit Ray, Ahmed Abdelkader, Chengzhi Mao, Bryan A. Plummer, Kate Saenko, Ranjay Krishna, Leonidas Guibas, Wen-Sheng Chu

    Abstract: Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimodal models exploring the potential of reasoning with images are brittle and do not scale. They rely on calling specialist tools, costly generation of images, or handcrafted reasoning data to switch between text and image thoughts. Instead,… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Project webpage: https://arijitray.com/multimodal_thinking/

  26. arXiv:2512.04354  [pdf

    cs.LG cs.HC

    SmartAlert: Implementing Machine Learning-Driven Clinical Decision Support for Inpatient Lab Utilization Reduction

    Authors: April S. Liang, Fatemeh Amrollahi, Yixing Jiang, Conor K. Corbin, Grace Y. E. Kim, David Mui, Trevor Crowell, Aakash Acharya, Sreedevi Mony, Soumya Punnathanam, Jack McKeown, Margaret Smith, Steven Lin, Arnold Milstein, Kevin Schulman, Jason Hom, Michael A. Pfeffer, Tho D. Pham, David Svec, Weihan Chu, Lisa Shieh, Christopher Sharp, Stephen P. Ma, Jonathan H. Chen

    Abstract: Repetitive laboratory testing unlikely to yield clinically useful information is a common practice that burdens patients and increases healthcare costs. Education and feedback interventions have limited success, while general test ordering restrictions and electronic alerts impede appropriate clinical care. We introduce and evaluate SmartAlert, a machine learning (ML)-driven clinical decision supp… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: 22 pages, 5 figures

  27. arXiv:2512.02965  [pdf

    cs.CV

    A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems

    Authors: Yuhan Chen, Yicui Shi, Guofa Li, Guangrui Bai, Jinyuan Shao, Xiangfei Huang, Wenbo Chu, Keqiang Li

    Abstract: In low-light environments like nighttime driving, image degradation severely challenges in-vehicle camera safety. Since existing enhancement algorithms are often too computationally intensive for vehicular applications, we propose UltraFast-LieNET, a lightweight multi-scale shifted convolutional network for real-time low-light image enhancement. We introduce a Dynamic Shifted Convolution (DSConv)… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  28. arXiv:2512.00008  [pdf, ps, other

    cs.CV cs.AI cs.HC

    MOTION: ML-Assisted On-Device Low-Latency Motion Recognition

    Authors: Veeramani Pugazhenthi, Wei-Hsiang Chu, Junwei Lu, Jadyn N. Miyahira, Mahdi Eslamimehr, Pratik Satam, Rozhin Yasaei, Soheil Salehi

    Abstract: The use of tiny devices capable of low-latency gesture recognition is gaining momentum in everyday human-computer interaction and especially in medical monitoring fields. Embedded solutions such as fall detection, rehabilitation tracking, and patient supervision require fast and efficient tracking of movements while avoiding unwanted false alarms. This study presents an efficient solution on how t… ▽ More

    Submitted 9 February, 2026; v1 submitted 13 October, 2025; originally announced December 2025.

  29. arXiv:2511.20809  [pdf, ps, other

    cs.CV

    Layer-Aware Video Composition via Split-then-Merge

    Authors: Ozgur Kara, Yujia Chen, Ming-Hsuan Yang, James M. Rehg, Wen-Sheng Chu, Du Tran

    Abstract: We present Split-then-Merge (StM), a novel framework designed to enhance control in generative video composition and address its data scarcity problem. Unlike conventional methods relying on annotated datasets or handcrafted rules, StM splits a large corpus of unlabeled videos into dynamic foreground and background layers, then self-composes them to learn how dynamic subjects interact with diverse… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: Project Webpage: https://split-then-merge.github.io

  30. arXiv:2510.15349   

    cs.CL

    Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

    Authors: Baode Wang, Biao Wu, Weizhen Li, Meng Fang, Zuming Huang, Jun Huang, Haozhe Wang, Yanjie Liang, Ling Chen, Wei Chu, Yuan Qi

    Abstract: Document parsing from scanned images into structured formats remains a significant challenge due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Existing supervised fine-tuning methods often struggle to generalize across diverse document types, leading to poor performance, particularly on out-of-distribution data. This issue is further exacerbated by t… ▽ More

    Submitted 20 October, 2025; v1 submitted 17 October, 2025; originally announced October 2025.

    Comments: This submission (arXiv:2510.15349) was mistakenly uploaded as a new article. It was intended to replace our previous work arXiv:2506.03197. All subsequent updates will be made to arXiv:2506.03197

    ACM Class: F.2.2; I.2.7

  31. arXiv:2509.20681  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

    Authors: Wei-Teng Chu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi

    Abstract: Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as NeuS and its variants generally require a large set of multi-view images as input, and require long training times. In this wor… ▽ More

    Submitted 11 March, 2026; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: 9 pages, 6 figures, 2026 IEEE International Conference on Robotics and Automation (ICRA)

  32. arXiv:2509.20016  [pdf

    physics.optics

    Nonreciprocal optical circuit switching

    Authors: Zhifeng Tu, Yucong Yang, Yiran Wei, Shuyuan Liu, Fangchen Hu, Peng Zou, Chengkun Yang, Tianchi Zhang, Di Wu, Ruoyu Shen, Bingzhou Hong, Haiwen Cai, Lei Bi, Wei Chu

    Abstract: Directly switching optical signals outperforms conventional optoelectronic hardware in terms of cost, latency, and energy efficiency, and is expected to address the growing demand for data node capacity driven by the development of machine learning and artificial intelligence (AI) technologies. Therefore, optical circuit switching (OCS) technology has piqued widespread research interest in various… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 24 pages, 5 figures

  33. arXiv:2509.07430  [pdf, ps, other

    cs.LG cs.AI

    The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

    Authors: Long Li, Zhijian Zhou, Jiaran Hao, Jason Klein Liu, Yanting Miao, Wei Pang, Xiaoyu Tan, Wei Chu, Zhe Wang, Shirui Pan, Chao Qu, Yuan Qi

    Abstract: A central paradox in fine-tuning Large Language Models (LLMs) with Reinforcement Learning with Verifiable Reward (RLVR) is the frequent degradation of multi-attempt performance (Pass@k) despite improvements in single-attempt accuracy (Pass@1). This is often accompanied by catastrophic forgetting, where models lose previously acquired skills. While various methods have been proposed, the choice and… ▽ More

    Submitted 3 March, 2026; v1 submitted 9 September, 2025; originally announced September 2025.

    Comments: 27 pages, 6 figures

  34. arXiv:2509.05878  [pdf, ps, other

    cs.CL

    MedFactEval and MedAgentBrief: A Framework and Workflow for Generating and Evaluating Factual Clinical Summaries

    Authors: François Grolleau, Emily Alsentzer, Timothy Keyes, Philip Chung, Akshay Swaminathan, Asad Aali, Jason Hom, Tridu Huynh, Thomas Lew, April S. Liang, Weihan Chu, Natasha Z. Steele, Christina F. Lin, Jingkun Yang, Kameron C. Black, Stephen P. Ma, Fateme N. Haredasht, Nigam H. Shah, Kevin Schulman, Jonathan H. Chen

    Abstract: Evaluating factual accuracy in Large Language Model (LLM)-generated clinical text is a critical barrier to adoption, as expert review is unscalable for the continuous quality assurance these systems require. We address this challenge with two complementary contributions. First, we introduce MedFactEval, a framework for scalable, fact-grounded evaluation where clinicians define high-salience key fa… ▽ More

    Submitted 6 September, 2025; originally announced September 2025.

  35. arXiv:2509.04702  [pdf, ps, other

    cs.CL

    OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics

    Authors: Wei Chu, Yuanzhe Dong, Ke Tan, Dong Han, Xavier Menendez-Pidal, Ruchao Fan, Chenfeng Miao, Chanwoo Kim, Bhiksha Raj, Rita Singh

    Abstract: OleSpeech-IV dataset is a large-scale multispeaker and multilingual conversational speech dataset with diverse topics. The audio content comes from publicly-available English podcasts, talk shows, teleconferences, and other conversations. Speaker names, turns, and transcripts are human-sourced and refined by a proprietary pipeline, while additional information such as timestamps and confidence sco… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  36. arXiv:2509.01555  [pdf

    physics.optics physics.app-ph

    400-Gbps/$λ$ Ultrafast Silicon Microring Modulator for Scalable Optical Compute Interconnects

    Authors: Fangchen Hu, Fengxin Yu, Xingyu Liu, Aoxue Wang, Xiao Hu, Haiwen Cai, Wei Chu

    Abstract: The exponential growth of artificial intelligence (AI) workloads is driving an urgent demand for optical interconnects with ultrahigh bandwidth, energy efficiency, and scalability. Silicon photonics, with its CMOS compatibility and wafer-scale manufacturability, has emerged as a promising platform for optical interconnect architectures. Silicon microring modulators (MRMs), with their compact footp… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

  37. arXiv:2508.18519  [pdf, ps, other

    math.DS

    What Do Bouncing Balls Tell Us About the Universe? A Journey into Billiard Systems

    Authors: Weiqi Chu, Matthew Dobson

    Abstract: Have you ever played or watched a game of pool? If so, you have already seen a billiard system in action. In mathematics and physics, a billiard system describes a ball that moves in straight lines and bounces off walls. Despite these simple rules, billiard systems can produce remarkably rich behaviors: some table shapes generate regular, periodic patterns, while others give rise to complete chaos… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: This article is under review in Frontiers for Young Minds, a journal that shares the latest research with kids ages 8--15. We would greatly appreciate feedback from you and from young readers

  38. arXiv:2508.02130  [pdf, ps, other

    cs.LG cs.AI

    The Complexity of Extreme Climate Events on the New Zealand's Kiwifruit Industry

    Authors: Boyuan Zheng, Victor W. Chu, Zhidong Li, Evan Webster, Ashley Rootsey

    Abstract: Climate change has intensified the frequency and severity of extreme weather events, presenting unprecedented challenges to the agricultural industry worldwide. In this investigation, we focus on kiwifruit farming in New Zealand. We propose to examine the impacts of climate-induced extreme events, specifically frost, drought, extreme rainfall, and heatwave, on kiwifruit harvest yields. These four… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: Pre-print v0.8 2025-08-04

  39. arXiv:2507.22962  [pdf, ps, other

    cs.LG

    Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

    Authors: Boyuan Zheng, Victor W. Chu

    Abstract: Climate extremes present escalating risks to agriculture intensifying the need for reliable multi-hazard early warning systems (EWS). The situation is evolving due to climate change and hence such systems should have the intelligent to continue to learn from recent climate behaviours. However, traditional single-hazard forecasting methods fall short in capturing complex interactions among concurre… ▽ More

    Submitted 30 July, 2025; originally announced July 2025.

    Comments: Pre-print v0.8 2025-07-30

  40. arXiv:2507.12832  [pdf, ps, other

    cs.CV cs.AI cs.LG

    MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results

    Authors: Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran

    Abstract: Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1)… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: This paper is the official challenge report for SMOT4SB and is published in the proceedings of MVA 2025 (19th International Conference on Machine Vision and Applications). Official challenge page: https://www.mva-org.jp/mva2025/challenge

  41. arXiv:2507.04832  [pdf, ps, other

    cs.LG

    Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

    Authors: Jiaqi Han, Austin Wang, Minkai Xu, Wenda Chu, Meihua Dang, Haotian Ye, Huayu Chen, Yisong Yue, Stefano Ermon

    Abstract: Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discre… ▽ More

    Submitted 31 January, 2026; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: ICLR 2026

  42. arXiv:2506.17645  [pdf, ps, other

    cs.CV

    Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

    Authors: Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynami… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: Accepted to MIDL 2025

  43. arXiv:2506.12716  [pdf, ps, other

    cs.CV

    Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors

    Authors: Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki

    Abstract: We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered sce… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: This is an updated and extended version of our CVPR paper "Robust Multi-Object 4D Generation in Complex Video Scenarios"

  44. arXiv:2506.08849  [pdf, ps, other

    cs.CV

    Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

    Authors: Jingguo Qu, Xinyang Han, Jia Ai, Juan Wu, Tong Zhao, Tonghuan Xiao, Sheng Ning, Yuqi Yang, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

    Abstract: Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities, yet their application to medical ultrasound remains constrained by the significant domain shift between natural images and sonographic data. The unique physics of ultrasound, manifesting as speckle noise, shadowing, and variable artifacts, often leads to suboptimal performance when applying off-the-shelf founda… ▽ More

    Submitted 7 January, 2026; v1 submitted 10 June, 2025; originally announced June 2025.

  45. arXiv:2506.07310  [pdf, ps, other

    cs.CV

    AllTracker: Efficient Dense Point Tracking at High Resolution

    Authors: Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas J. Guibas

    Abstract: We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to… ▽ More

    Submitted 1 August, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

  46. arXiv:2506.03197  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

    Authors: Baode Wang, Biao Wu, Weizhen Li, Meng Fang, Zuming Huang, Jun Huang, Haozhe Wang, Yanjie Liang, Ling Chen, Wei Chu, Yuan Qi

    Abstract: Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse layouts. We introduce layoutRL, an end-to-end reinforcement learning framework that trains models to be explicitly layout-aware by optimizing a composite reward of… ▽ More

    Submitted 20 October, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

    Comments: 16 pages, 12 figures

    Report number: INF-CS-TR-2025-02

  47. arXiv:2506.02588  [pdf

    cond-mat.soft cond-mat.dis-nn cond-mat.mtrl-sci cond-mat.stat-mech physics.chem-ph

    Emergent rigidity percolation of five-fold aggregates enables controllable glass properties

    Authors: Wei Chu, Zheng Wang, Christopher Ness, Konrad Samwer, Alessio Zaccone, Lina Hu

    Abstract: Metallic glasses possess outstanding mechanical and physical properties, making them promising candidates for advanced structural and functional applications; however, the lack of understanding and control over their glass transition and solidification processes remains a significant barrier to practical design. The glass transition from liquid to amorphous solid has remained an open problem in ph… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  48. arXiv:2505.24850  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

    Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi

    Abstract: Recent advances in model distillation show that data from advanced reasoning models can effectively train smaller student models. However, standard practices discard incorrect reasoning traces -- valuable, yet underutilized data. This paper addresses the critical question: How can both positive and negative distilled reasoning traces be effectively leveraged to maximize LLM reasoning performance i… ▽ More

    Submitted 14 December, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 22 pages, 10 figures. Code available at https://github.com/Tim-Siu/reinforcement-distillation

    ACM Class: I.2.6; I.2.7

  49. arXiv:2505.21853  [pdf, ps, other

    physics.med-ph physics.app-ph

    Quantitative Macromolecular Proton Fraction Imaging using Pulsed Spin-Lock

    Authors: Qianxue Shan, Ziqiang Yu, Baiyan Jiang, Jian Hou, Qiuyi Shen, Winnie CW Chu, Vincent WS Wong, Weitian Chen

    Abstract: Purpose: Recent studies have shown that spin-lock MRI can simplify quantitative magnetization transfer (MT) by eliminating its dependency on water pool parameters, removing the need for a T1 map in macromolecular proton fraction (MPF) quantification. However, its application is often limited by the requirement for long radiofrequency (RF) pulse durations, which are constrained by RF hardware capab… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures; Qianxue Shan and Ziqiang Yu contributed equally to this work

  50. arXiv:2505.15093  [pdf, ps, other

    q-bio.BM cs.LG

    Steering Generative Models with Experimental Data for Protein Fitness Optimization

    Authors: Jason Yang, Wenda Chu, Daniel Khalil, Raul Astudillo, Bruce J. Wittmann, Frances H. Arnold, Yisong Yue

    Abstract: Protein fitness optimization involves finding a protein sequence that maximizes desired quantitative properties in a combinatorially large design space of possible sequences. Recent advances in steering protein generative models (e.g., diffusion models and language models) with labeled data offer a promising approach. However, most previous studies have optimized surrogate rewards and/or utilized… ▽ More

    Submitted 20 October, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: NeurIPS 2025