Skip to main content

Showing 1–50 of 220 results for author: Tan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.10634  [pdf, ps, other

    cs.CV

    NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Runzhe Li, Kui Jiang, Zhaocheng Yu, Yiang Chen, Junjun Jiang, Xianming Liu, Hongde Gu, Zeliang Li, Mache You , et al. (73 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for train… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPR2026 Workshop; NTIRE 2026 Challenge Report

  2. arXiv:2604.10485  [pdf, ps, other

    cs.CV cs.AI

    UDAPose: Unsupervised Domain Adaptation for Low-Light Human Pose Estimation

    Authors: Haopeng Chen, Yihao Ai, Kabeen Kim, Robby T. Tan, Yixin Chen, Bo Wang

    Abstract: Low-visibility scenarios, such as low-light conditions, pose significant challenges to human pose estimation due to the scarcity of annotated low-light datasets and the loss of visual information under poor illumination. Recent domain adaptation techniques attempt to utilize well-lit labels by augmenting well-lit images to mimic low-light conditions. But handcrafted augmentations oversimplify nois… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted at CVPR 2026

  3. arXiv:2604.08762  [pdf, ps, other

    cs.CV cs.AI

    InstrAct: Towards Action-Centric Understanding in Instructional Videos

    Authors: Zhuoyi Yang, Jiapeng Yu, Reuben Tan, Boyang Li, Huijuan Xu

    Abstract: Understanding instructional videos requires recognizing fine-grained actions and modeling their temporal relations, which remains challenging for current Video Foundation Models (VFMs). This difficulty stems from noisy web supervision and a pervasive "static bias", where models rely on objects rather than motion cues. To address this, we propose InstrAction, a pretraining framework for instruction… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  4. arXiv:2604.05965  [pdf, ps, other

    cs.AI

    Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

    Authors: Renxuan Tan, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

    Abstract: Transcending the single-preference paradigm, aligning LLMs with diverse human values is pivotal for robust deployment. Contemporary Multi-Objective Preference Alignment (MPA) approaches predominantly rely on static linear scalarization or rigid gradient projection to navigate these trade-offs. However, by enforcing strict conflict avoidance or simultaneous descent, these paradigms often prematurel… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  5. arXiv:2604.01586  [pdf, ps, other

    cs.CV cs.AI

    SHOE: Semantic HOI Open-Vocabulary Evaluation Metric

    Authors: Maja Noack, Qinqian Lei, Taipeng Tian, Bihan Dong, Robby T. Tan, Yixin Chen, John Young, Saijun Zhang, Bo Wang

    Abstract: Open-vocabulary human-object interaction (HOI) detection is a step towards building scalable systems that generalize to unseen interactions in real-world scenarios and support grounded multimodal systems that reason about human-object relationships. However, standard evaluation metrics, such as mean Average Precision (mAP), treat HOI classes as discrete categorical labels and fail to credit semant… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: Accepted to GRAIL-V Workshop at CVPR 2026

  6. arXiv:2603.23705  [pdf, ps, other

    cs.DS

    Distributionally Robust $k$-of-$n$ Sequential Testing

    Authors: Rayen Tan, Viswanath Nagarajan

    Abstract: The $k$-of-$n$ testing problem involves performing $n$ independent tests sequentially, in order to determine whether/not at least $k$ tests pass. The objective is to minimize the expected cost of testing. This is a fundamental and well-studied stochastic optimization problem. However, a key limitation of this model is that the success/failure probability of each test is assumed to be known precise… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: 28 pages, 3 figures

  7. arXiv:2603.21697  [pdf, ps, other

    cs.CR cs.AI cs.MM

    Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

    Authors: Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

    Abstract: Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbre… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: 31 pages

  8. arXiv:2603.15888  [pdf, ps, other

    cs.AI cs.CV cs.RO

    AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

    Authors: Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao

    Abstract: With AsgardBench we aim to evaluate visually grounded, high-level action sequence generation and interactive planning, focusing specifically on plan adaptation during execution based on visual observations rather than navigation or low-level manipulation. In the landscape of embodied AI benchmarks, AsgardBench targets the capability category of interactive planning, which is more sophisticated tha… ▽ More

    Submitted 18 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

    Comments: 19 figures, 6 tables, including appendix

    ACM Class: I.2.8; I.2.10

  9. arXiv:2603.13433  [pdf, ps, other

    cs.RO cs.AI

    Spatially Grounded Long-Horizon Task Planning in the Wild

    Authors: Sehun Jung, HyunJee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim

    Abstract: Recent advances in robot manipulation increasingly leverage Vision-Language Models (VLMs) for high-level reasoning, such as decomposing task instructions into sequential action plans expressed in natural language that guide downstream low-level motor execution. However, current benchmarks do not assess whether these plans are spatially executable, particularly in specifying the exact spatial locat… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: 9 pages, 7 figures

  10. arXiv:2603.04000  [pdf, ps, other

    cs.LG

    On the Learnability of Offline Model-Based Optimization: A Ranking Perspective

    Authors: Shen-Huan Lyu, Rong-Xi Tan, Ke Xue, Yi-Xiao He, Yu Huang, Qingfu Zhang, Chao Qian

    Abstract: Offline model-based optimization (MBO) seeks to discover high-performing designs using only a fixed dataset of past evaluations. Most existing methods rely on learning a surrogate model via regression and implicitly assume that good predictive accuracy leads to good optimization performance. In this work, we challenge this assumption and study offline MBO from a learnability perspective. We argue… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  11. arXiv:2602.15383  [pdf, ps, other

    cs.CV

    Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation

    Authors: Shuwei Li, Lei Tan, Robby T. Tan

    Abstract: Day-to-night unpaired image translation is important to downstream tasks but remains challenging due to large appearance shifts and the lack of direct pixel-level supervision. Existing methods often introduce semantic hallucinations, where objects from target classes such as traffic signs and vehicles, as well as man-made light effects, are incorrectly synthesized. These hallucinations significant… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

    Comments: Accepted at AAAI 2026 (Oral)

  12. arXiv:2602.15011  [pdf, ps, other

    cs.HC

    TouchFusion: Multimodal Wristband Sensing for Ubiquitous Touch Interactions

    Authors: Eric Whitmire, Evan Strasnick, Roger Boldu, Raj Sodhi, Nathan Godwin, Shiu Ng, Andre Levi, Amy Karlson, Ran Tan, Josef Faller, Emrah Adamey, Hanchuan Li, Wolf Kienzle, Hrvoje Benko

    Abstract: TouchFusion is a wristband that enables touch interactions on nearby surfaces without any additional instrumentation or computer vision. TouchFusion combines surface electromyography (sEMG), bioimpedance, inertial, and optical sensing to capture multiple facets of hand activity during touch interactions. Through a combination of early and late fusion, TouchFusion enables stateful touch detection o… ▽ More

    Submitted 16 February, 2026; originally announced February 2026.

    Comments: 23 pages, 22 figures, accompanying video available at https://youtu.be/0fdCwHu7uaA

  13. arXiv:2602.13458  [pdf, ps, other

    cs.SI cs.AI

    MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook

    Authors: Yi Feng, Chen Huang, Zhibo Man, Ryner Tan, Long P. Hoang, Shaoyang Xu, Wenxuan Zhang

    Abstract: Large-scale communities of AI agents are becoming increasingly prevalent, creating new environments for agent-agent social interaction. Prior work has examined multi-agent behavior primarily in controlled or small-scale settings, limiting our understanding of emergent social dynamics at scale. The recent emergence of MoltBook, a social networking platform designed explicitly for AI agents, present… ▽ More

    Submitted 6 April, 2026; v1 submitted 13 February, 2026; originally announced February 2026.

  14. arXiv:2602.09609  [pdf, ps, other

    cs.CV

    Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing

    Authors: Jialun Liu, Tian Li, Xiao Cao, Yukuo Ma, Gonghu Shang, Haibin Huang, Chi Zhang, Xiangzhen Chang, Zhiyong Huang, Jiakui Hu, Zuoxin Li, Yuanzhi Liang, Cong Liu, Junqi Liu, Robby T. Tan, Haitong Tang, Qizhen Weng, Yifan Xu, Liying Yang, Xiaoyan Yang, Peng Yu, Shiwen Zhang, Xuelong Li

    Abstract: Recent advances in diffusion-based video generation have substantially improved visual fidelity and temporal coherence. However, most existing approaches remain task-specific and rely primarily on textual instructions, limiting their ability to handle multimodal inputs, contextual references, and diverse video generation and editing scenarios within a unified framework. Moreover, many video editin… ▽ More

    Submitted 23 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

  15. arXiv:2602.02137  [pdf, ps, other

    cs.LG cs.AI eess.SY

    DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations

    Authors: Minghao Li, Ruihang Wang, Rui Tan, Yonggang Wen

    Abstract: Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for safe and energy-efficient operation. However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an… ▽ More

    Submitted 25 February, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

    Comments: Accepted as a full paper at HSCC/ICCPS 2026

  16. arXiv:2602.01905  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Learning Sparse Visual Representations via Spatial-Semantic Factorization

    Authors: Theodore Zhengde Zhao, Sid Kiblawi, Jianwei Yang, Naoto Usuyama, Reuben Tan, Noel C Codella, Tristan Naumann, Hoifung Poon, Mu Wei

    Abstract: Self-supervised learning (SSL) faces a fundamental conflict between semantic understanding and image reconstruction. High-level semantic SSL (e.g., DINO) relies on global tokens that are forced to be location-invariant for augmentation alignment, a process that inherently discards the spatial coordinates required for reconstruction. Conversely, generative SSL (e.g., MAE) preserves dense feature gr… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

  17. arXiv:2601.10143  [pdf, ps, other

    cs.AI q-fin.TR

    History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis

    Authors: Haochong Xia, Yao Long Teng, Regan Tan, Molei Qin, Xinrun Wang, Bo An

    Abstract: In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra "History Is Not Enough" underscores the need for adaptive data generatio… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  18. arXiv:2601.09351  [pdf, ps, other

    cs.CY cs.AI

    Navigating Ethical AI Challenges in the Industrial Sector: Balancing Innovation and Responsibility

    Authors: Ruomu Tan, Martin W Hoffmann

    Abstract: The integration of artificial intelligence (AI) into the industrial sector has not only driven innovation but also expanded the ethical landscape, necessitating a reevaluation of principles governing technology and its applications and awareness in research and development of industrial AI solutions. This chapter explores how AI-empowered industrial innovation inherently intersects with ethics, as… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    MSC Class: 68T99 ACM Class: K.4.0; I.2.1; I.2.9

  19. arXiv:2601.08790  [pdf, ps, other

    cs.CV

    Aggregating Diverse Cue Experts for AI-Generated Image Detection

    Authors: Lei Tan, Shuwei Li, Mohan Kankanhalli, Robby T. Tan

    Abstract: The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on model-specific features, leading to overfitting and poor generalization. In this paper, we introduce the Multi-Cue Aggregation Network (MCAN), a novel framework that integrates different yet complementary cues in a unified network. MCAN emplo… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: Accepted by AAAI 2026

  20. arXiv:2601.06309  [pdf, ps, other

    cs.CV cs.AI

    VideoWeave: A Data-Centric Approach for Efficient Video Understanding

    Authors: Zane Durante, Silky Singh, Arpandeep Khatua, Shobhit Agarwal, Reuben Tan, Yong Jae Lee, Jianfeng Gao, Ehsan Adeli, Li Fei-Fei

    Abstract: Training video-language models is often prohibitively expensive due to the high cost of processing long frame sequences and the limited availability of annotated long videos. We present VideoWeave, a simple yet effective approach to improve data efficiency by constructing synthetic long-context training samples that splice together short, captioned videos from existing datasets. Rather than modify… ▽ More

    Submitted 9 January, 2026; originally announced January 2026.

  21. arXiv:2512.22150  [pdf, ps, other

    cs.LG stat.ML

    Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders

    Authors: Hans Jarett J. Ong, Brian Godwin S. Lim, Dominic Dayta, Renzo Roel P. Tan, Kazushi Ikeda

    Abstract: Unsupervised representation learning seeks to recover latent generative factors, yet standard methods relying on statistical independence often fail to capture causal dependencies. A central challenge is identifiability: as established in disentangled representation learning and nonlinear ICA literature, disentangling causal variables from observational data is impossible without supervision, auxi… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  22. arXiv:2512.21794  [pdf, ps, other

    cs.GT cs.AI cs.LG cs.MA econ.TH

    Multi-agent Adaptive Mechanism Design

    Authors: Qiushi Han, David Simchi-Levi, Renfei Tan, Zishuo Zhao

    Abstract: We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Through… ▽ More

    Submitted 10 April, 2026; v1 submitted 25 December, 2025; originally announced December 2025.

  23. arXiv:2512.17985  [pdf, ps, other

    cs.LG

    MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements

    Authors: Ruichen Tan, Jiawei Xue, Kota Tsubouchi, Takahiro Yabe, Satish V. Ukkusuri

    Abstract: Accurate prediction of the next point of interest (POI) within human mobility trajectories is essential for location-based services, as it enables more timely and personalized recommendations. In particular, with the rise of these approaches, studies have shown that users exhibit different POI choices in their familiar and unfamiliar areas, highlighting the importance of incorporating user familia… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: 30 pages, 4 figures, 5 tables

  24. arXiv:2512.16334  [pdf

    cs.LG cs.AI

    Pretrained battery transformer (PBT): A foundation model for universal battery life prediction

    Authors: Ruifeng Tan, Weixiang Hong, Jia Li, Jiaqiang Huang, Tong-Yi Zhang

    Abstract: Early prediction of battery cycle life is essential for improving battery design, manufacturing, and deployment. However, despite encouraging results with machine learning, progress remains constrained by scarce data and data heterogeneity across battery chemistries, specifications, formation protocols, and operating conditions. Although transfer learning has been widely explored to alleviate thes… ▽ More

    Submitted 11 March, 2026; v1 submitted 18 December, 2025; originally announced December 2025.

    Comments: 5 figures in the main content

  25. arXiv:2512.12551  [pdf, ps, other

    cs.SE

    Assessing the Capability of Android Dynamic Analysis Tools to Combat Anti-Runtime Analysis Techniques

    Authors: Dewen Suo, Lei Xue, Weihao Huang, Runze Tan, Guozi Sun

    Abstract: As the dominant mobile operating system, Android continues to attract a substantial influx of new applications each year. However, this growth is accompanied by increased attention from malicious actors, resulting in a significant rise in security threats to the Android ecosystem. Among these threats, the adoption of Anti-Runtime Analysis (ARA) techniques by malicious applications poses a serious… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  26. arXiv:2512.10611  [pdf, ps, other

    cs.AI cs.NE

    Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs

    Authors: Minghao LI, Ruihang Wang, Rui Tan, Yonggang Wen

    Abstract: Data center (DC) infrastructure serves as the backbone to support the escalating demand for computing capacity. Traditional design methodologies that blend human expertise with specialized simulation tools scale poorly with the increasing system complexity. Recent studies adopt generative artificial intelligence to design plausible human-centric indoor layouts. However, they do not consider the un… ▽ More

    Submitted 15 December, 2025; v1 submitted 11 December, 2025; originally announced December 2025.

  27. arXiv:2512.06533  [pdf, ps, other

    cs.LG cs.AI

    Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

    Authors: Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian

    Abstract: Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail t… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

  28. arXiv:2512.03438  [pdf, ps, other

    cs.AI

    Multimodal Reinforcement Learning with Agentic Verifier for AI Agents

    Authors: Reuben Tan, Baolin Peng, Zhengyuan Yang, Hao Cheng, Oier Mees, Theodore Zhao, Andrea Tupini, Isar Meijier, Qianhui Wu, Yuncong Yang, Lars Liden, Yu Gu, Sheng Zhang, Xiaodong Liu, Lijuan Wang, Marc Pollefeys, Yong Jae Lee, Jianfeng Gao

    Abstract: Agentic reasoning models trained with multimodal reinforcement learning (MMRL) have become increasingly capable, yet they are almost universally optimized using sparse, outcome-based rewards computed based on the final answers. Richer rewards computed from the reasoning tokens can improve learning significantly by providing more fine-grained guidance. However, it is challenging to compute more inf… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  29. arXiv:2511.23269  [pdf, ps, other

    cs.AI

    OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning

    Authors: Timothy Ossowski, Sheng Zhang, Qianchu Liu, Guanghui Qin, Reuben Tan, Tristan Naumann, Junjie Hu, Hoifung Poon

    Abstract: High-quality and carefully curated data is a cornerstone of training medical large language models, as it directly impacts both generalization and robustness to unseen clinical tasks. We investigate strategies for training and data curation to develop a robust multimodal reasoning model in the medical domain. Our work focuses on supervised fine-tuning (SFT) and explores data recipes that leverage… ▽ More

    Submitted 28 November, 2025; originally announced November 2025.

  30. arXiv:2511.11046  [pdf, ps, other

    cs.LG cs.AI

    Enhancing Graph Representations with Neighborhood-Contextualized Message-Passing

    Authors: Brian Godwin Lim, Galvin Brice Lim, Renzo Roel Tan, Irwin King, Kazushi Ikeda

    Abstract: Graph neural networks (GNNs) have become an indispensable tool for analyzing relational data. Classical GNNs are broadly classified into three variants: convolutional, attentional, and message-passing. While the standard message-passing variant is expressive, its typical pair-wise messages only consider the features of the center node and each neighboring node individually. This design fails to in… ▽ More

    Submitted 7 January, 2026; v1 submitted 14 November, 2025; originally announced November 2025.

  31. arXiv:2510.25384  [pdf, ps, other

    cs.CL

    Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires

    Authors: Doan Nam Long Vu, Rui Tan, Lena Moench, Svenja Jule Francke, Daniel Woiwod, Florian Thomas-Odenthal, Sanna Stroth, Tilo Kircher, Christiane Hermann, Udo Dannlowski, Hamidreza Jamalabadi, Shaoxiong Ji

    Abstract: The development of AI for mental health is hindered by a lack of authentic therapy dialogues, due to strict privacy regulations and the fact that clinical sessions were historically rarely recorded. We present an LLM-driven pipeline that generates synthetic counseling dialogues based on structured client profiles and psychological questionnaires. Grounded on the principles of Cognitive Behavioral… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  32. arXiv:2510.23472  [pdf, ps, other

    cs.LG cs.AI cs.AR cs.NE

    BBOPlace-Bench: Benchmarking Black-Box Optimization for Chip Placement

    Authors: Ke Xue, Ruo-Tong Chen, Rong-Xi Tan, Xi Lin, Yunqi Shi, Siyuan Xu, Mingxuan Yuan, Chao Qian

    Abstract: Chip placement is a vital stage in modern chip design as it has a substantial impact on the subsequent processes and the overall quality of the final chip. The use of black-box optimization (BBO) for chip placement has a history of several decades. However, early efforts were limited by immature problem formulations and inefficient algorithm designs. Recent progress has shown the effectiveness and… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

  33. arXiv:2510.15938  [pdf, ps, other

    q-fin.ST cs.LG stat.ML

    Dynamic Factor Analysis of Price Movements in the Philippine Stock Exchange

    Authors: Brian Godwin Lim, Dominic Dayta, Benedict Ryan Tiu, Renzo Roel Tan, Len Patrick Dominic Garces, Kazushi Ikeda

    Abstract: The intricate dynamics of stock markets have led to extensive research on models that are able to effectively explain their inherent complexities. This study leverages the econometrics literature to explore the dynamic factor model as an interpretable model with sufficient predictive capabilities for capturing essential market phenomena. Although the model has been extensively applied for predicti… ▽ More

    Submitted 8 October, 2025; originally announced October 2025.

    Journal ref: Financial Innovation. 12(2026)

  34. arXiv:2510.10895  [pdf, ps, other

    cs.AI

    LLM-Empowered Agentic MAC Protocols: A Dynamic Stackelberg Game Approach

    Authors: Renxuan Tan, Rongpeng Li, Fei Wang, Chenghui Peng, Shaoyun Wu, Zhifeng Zhao, Honggang Zhang

    Abstract: Medium Access Control (MAC) protocols, essential for wireless networks, are typically manually configured. While deep reinforcement learning (DRL)-based protocols enhance task-specified network performance, they suffer from poor generalizability and resilience, demanding costly retraining to adapt to dynamic environments. To overcome this limitation, we introduce a game-theoretic LLM-empowered mul… ▽ More

    Submitted 12 October, 2025; originally announced October 2025.

    Comments: This work has been submitted to IEEE for possible publication

  35. arXiv:2510.05553  [pdf, ps, other

    cs.RO eess.SY

    GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps

    Authors: Yan Rui Tan, Wenqi Liu, Wai Lun Leong, John Guan Zhong Tan, Wayne Wen Huei Yong, Fan Shi, Rodney Swee Huat Teo

    Abstract: Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free e… ▽ More

    Submitted 6 October, 2025; originally announced October 2025.

  36. arXiv:2510.03110  [pdf, ps, other

    cs.CV

    GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion

    Authors: Beibei Lin, Tingting Chen, Robby T. Tan

    Abstract: Reference-driven image completion, which restores missing regions in a target view using additional images, is particularly challenging when the target view differs significantly from the references. Existing generative methods rely solely on diffusion priors and, without geometric cues such as camera pose or depth, often produce misaligned or implausible content. We propose GeoComplete, a novel f… ▽ More

    Submitted 3 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025. Project page: https://bb12346.github.io/GeoComplete/

  37. arXiv:2509.23234  [pdf, ps, other

    cs.AI cs.CL

    p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

    Authors: Runyan Tan, Shuang Wu, Phillip Howard

    Abstract: Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation… ▽ More

    Submitted 27 February, 2026; v1 submitted 27 September, 2025; originally announced September 2025.

  38. arXiv:2509.21823  [pdf, ps, other

    cs.AI

    ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

    Authors: Gaole Dai, Shiqi Jiang, Ting Cao, Yuqing Yang, Yuanchun Li, Rui Tan, Mo Li, Lili Qiu

    Abstract: Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe,… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 10 pages, 7 figures

  39. arXiv:2509.18234  [pdf, ps, other

    cs.AI cs.CL cs.LG

    The Illusion of Readiness in Health AI

    Authors: Yu Gu, Jingjing Fu, Xiaodong Liu, Jeya Maria Jose Valanarasu, Noel CF Codella, Reuben Tan, Qianchu Liu, Ying Jin, Sheng Zhang, Jinyu Wang, Rui Wang, Lei Song, Guanghui Qin, Naoto Usuyama, Cliff Wong, Hao Cheng, HoHin Lee, Praneeth Sanapathi, Sarah Hilado, Tristan Naumann, Javier Alvarez-Valle, Jiang Bian, Mu Wei, Khalil Malik, Lidong Zhou , et al. (7 additional authors not shown)

    Abstract: Large language models have demonstrated remarkable performance in a wide range of medical benchmarks. Yet underneath the seemingly promising results lie salient growth areas, especially in cutting-edge frontiers such as multimodal reasoning. In this paper, we introduce a series of adversarial stress tests to systematically assess the robustness of flagship models and medical benchmarks. Our study… ▽ More

    Submitted 11 December, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

  40. arXiv:2509.17435  [pdf, ps, other

    cs.RO eess.SY

    GPS Denied IBVS-Based Navigation and Collision Avoidance of UAV Using a Low-Cost RGB Camera

    Authors: Xiaoyu Wang, Yan Rui Tan, William Leong, Sunan Huang, Rodney Teo, Cheng Xiang

    Abstract: This paper proposes an image-based visual servoing (IBVS) framework for UAV navigation and collision avoidance using only an RGB camera. While UAV navigation has been extensively studied, it remains challenging to apply IBVS in missions involving multiple visual targets and collision avoidance. The proposed method achieves navigation without explicit path planning, and collision avoidance is reali… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

  41. arXiv:2509.12248  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

    Authors: Yuriel Ryan, Rui Yang Tan, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

    Abstract: Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top… ▽ More

    Submitted 17 September, 2025; v1 submitted 11 September, 2025; originally announced September 2025.

    Comments: 27 pages, 8 figures, EMNLP 2025 Findings

  42. Asymmetry Vulnerability and Physical Attacks on Online Map Construction for Autonomous Driving

    Authors: Yang Lou, Haibo Hu, Qun Song, Qian Xu, Yi Zhu, Rui Tan, Wei-Bin Lee, Jianping Wang

    Abstract: High-definition maps provide precise environmental information essential for prediction and planning in autonomous driving systems. Due to the high cost of labeling and maintenance, recent research has turned to online HD map construction using onboard sensor data, offering wider coverage and more timely updates for autonomous vehicles. However, the robustness of online map construction under adve… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

    Comments: CCS'25 (a shorter version of this paper will appear in the conference proceeding)

  43. arXiv:2509.03221  [pdf, ps, other

    cs.CV cs.AI

    LGBP-OrgaNet: Learnable Gaussian Band Pass Fusion of CNN and Transformer Features for Robust Organoid Segmentation and Tracking

    Authors: Jing Zhang, Siying Tao, Jiao Li, Tianhe Wang, Junchen Wu, Ruqian Hao, Xiaohui Du, Ruirong Tan, Rui Li

    Abstract: Organoids replicate organ structure and function, playing a crucial role in fields such as tumor treatment and drug screening. Their shape and size can indicate their developmental status, but traditional fluorescence labeling methods risk compromising their structure. Therefore, this paper proposes an automated, non-destructive approach to organoid segmentation and tracking. We introduced the LGB… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  44. arXiv:2509.00665  [pdf, ps, other

    cs.CV cs.RO

    ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation

    Authors: Weilong Yan, Xin Zhang, Robby T. Tan

    Abstract: Monocular depth estimation under adverse weather conditions (e.g.\ rain, fog, snow, and nighttime) remains highly challenging due to the lack of reliable ground truth and the difficulty of learning from unlabeled real-world data. Existing methods often rely on synthetic adverse data with pseudo-labels, which suffer from domain gaps, or employ self-supervised learning, which violates photometric as… ▽ More

    Submitted 6 September, 2025; v1 submitted 30 August, 2025; originally announced September 2025.

  45. arXiv:2508.18753  [pdf, ps, other

    cs.CV

    CrossHOI-Bench: A Unified Benchmark for HOI Evaluation across Vision-Language Models and HOI-Specific Methods

    Authors: Qinqian Lei, Bo Wang, Robby T. Tan

    Abstract: HOI detection has long been dominated by task-specific models, sometimes with early vision-language backbones such as CLIP. With the rise of large generative VLMs, a key question is whether standalone VLMs can perform HOI detection competitively against specialized HOI methods. Existing benchmarks such as HICO-DET require exact label matching under incomplete annotations, so any unmatched predicti… ▽ More

    Submitted 19 March, 2026; v1 submitted 26 August, 2025; originally announced August 2025.

    Comments: Accepted by CVPR 2026

  46. arXiv:2508.16037  [pdf, ps, other

    cs.LG cs.AI

    Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services

    Authors: Renxuan Tan, Rongpeng Li, Xiaoxue Yu, Xianfu Chen, Xing Xu, Zhifeng Zhao

    Abstract: Federated learning (FL) in multi-service provider (SP) ecosystems is fundamentally hampered by non-cooperative dynamics, where privacy constraints and competing interests preclude the centralized optimization of multi-SP communication and computation resources. In this paper, we introduce PAC-MCoFL, a game-theoretic multi-agent reinforcement learning (MARL) framework where SPs act as agents to joi… ▽ More

    Submitted 28 August, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  47. arXiv:2508.14922  [pdf

    q-bio.QM cs.AI cs.CV eess.IV

    Fusing Structural Phenotypes with Functional Data for Early Prediction of Primary Angle Closure Glaucoma Progression

    Authors: Swati Sharma, Thanadet Chuangsuwanich, Royston K. Y. Tan, Shimna C. Prasad, Tin A. Tun, Shamira A. Perera, Martin L. Buist, Tin Aung, Monisha E. Nongpiur, Michaël J. A. Girard

    Abstract: Purpose: To classify eyes as slow or fast glaucoma progressors in patients with primary angle closure glaucoma (PACG) using an integrated approach combining optic nerve head (ONH) structural features and sector-based visual field (VF) functional parameters. Methods: PACG patients with >5 reliable VF tests over >5 years were included. Progression was assessed in Zeiss Forum, with baseline VF within… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: 23 pages, 5 figures, 3 tables

  48. arXiv:2507.15542  [pdf, ps, other

    cs.CV

    HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation

    Authors: Qinqian Lei, Bo Wang, Robby T. Tan

    Abstract: Zero-shot human-object interaction (HOI) detection remains a challenging task, particularly in generalizing to unseen actions. Existing methods address this challenge by tapping Vision-Language Models (VLMs) to access knowledge beyond the training data. However, they either struggle to distinguish actions involving the same object or demonstrate limited generalization to unseen classes. In this pa… ▽ More

    Submitted 3 August, 2025; v1 submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  49. arXiv:2507.12508  [pdf, ps, other

    cs.CV cs.AI cs.RO

    MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

    Authors: Yuncong Yang, Jiageng Liu, Zheyuan Zhang, Siyuan Zhou, Reuben Tan, Jianwei Yang, Yilun Du, Chuang Gan

    Abstract: Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as anticipating how a scene will look after an egocentric motion: they perceive 2D images but lack an internal model of 3D dynamics. We therefore propose MindJourney, a… ▽ More

    Submitted 1 November, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: Project Page: https://umass-embodied-agi.github.io/MindJourney

  50. arXiv:2507.10535  [pdf, ps, other

    cs.CL cs.AI cs.SE

    CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks

    Authors: Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan

    Abstract: Large Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, de… ▽ More

    Submitted 14 August, 2025; v1 submitted 14 July, 2025; originally announced July 2025.

    Comments: Dataset is available at https://huggingface.co/datasets/mattymchen/codejudgebench