Skip to main content

Showing 1–50 of 94 results for author: See, S

.
  1. arXiv:2510.12119  [pdf, ps, other

    cs.CV

    ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation

    Authors: Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: The widespread adoption of Retrieval-Augmented Image Generation (RAIG) has raised significant concerns about the unauthorized use of private image datasets. While these systems have shown remarkable capabilities in enhancing generation quality through reference images, protecting visual datasets from unauthorized use in such systems remains a challenging problem. Traditional digital watermarking a… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: Accepted at NeurIPS 2025

  2. arXiv:2510.07172  [pdf, ps, other

    cs.AI

    NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

    Authors: Tianshi Zheng, Kelvin Kiu-Wai Tam, Newt Hue-Nam K. Nguyen, Baixuan Xu, Zhaowei Wang, Jiayang Cheng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Tianqing Fang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Large language models are emerging as powerful tools for scientific law discovery, a foundational challenge in AI-driven science. However, existing benchmarks for this task suffer from a fundamental methodological trilemma, forcing a trade-off between scientific relevance, scalability, and resistance to memorization. Furthermore, they oversimplify discovery as static function fitting, failing to c… ▽ More

    Submitted 9 December, 2025; v1 submitted 8 October, 2025; originally announced October 2025.

    Comments: 71 pages, 21 figures, 21 tables

  3. arXiv:2510.01767  [pdf, ps, other

    cs.CV

    LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction

    Authors: Sheng-Hsiang Hung, Ting-Yu Yen, Wei-Fang Sun, Simon See, Shih-Hsuan Hung, Hung-Kuo Chu

    Abstract: 3D Gaussian Splatting (3DGS) has established itself as an efficient representation for real-time, high-fidelity 3D scene reconstruction. However, scaling 3DGS to large and unbounded scenes such as city blocks remains difficult. Existing divide-and-conquer methods alleviate memory pressure by partitioning the scene into blocks, but introduce new bottlenecks: (i) partitions suffer from severe load i… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  4. arXiv:2509.02618  [pdf, ps, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Etching-free dual-lift-off for direct patterning of epitaxial oxide thin films

    Authors: Jiayi Qin, Josephine Si Yu See, Yanran Liu, Xueyan Wang, Wenhai Zhao, Yang He, Jianbo Ding, Yilin Wu, Shanhu Wang, Huiping Han, Afzal Khan, Shuya Liu, Sheng'an Yang, Hui Zhang, Jiangnan Li, Qingming Chen, Jiyang Xie, Ji Ma, Wanbiao Hu, Jianhong Yi, Liang Wu, X. Renshaw Wang

    Abstract: Although monocrystalline oxide films offer broad functional capabilities, their practical use is hampered by challenges in patterning. Traditional patterning relies on etching, which can be costly and prone to issues like film or substrate damage, under-etching, over-etching, and lateral etching. In this study, we introduce a dual-lift-off method for direct patterning of oxide films, circumventing… ▽ More

    Submitted 31 August, 2025; originally announced September 2025.

    Journal ref: Nano Letters 25, 13184 (2025)

  5. arXiv:2509.00524  [pdf, ps, other

    cs.LG q-bio.MN

    Biological Pathway Informed Models with Graph Attention Networks (GATs)

    Authors: Gavin Wong, Ping Shu Ho, Ivan Au Yeung, Ka Chun Cheung, Simon See

    Abstract: Biological pathways map gene-gene interactions that govern all human processes. Despite their importance, most ML models treat genes as unstructured tokens, discarding known pathway structure. The latest pathway-informed models capture pathway-pathway interactions, but still treat each pathway as a "bag of genes" via MLPs, discarding its topology and gene-gene interactions. We propose a Graph Atte… ▽ More

    Submitted 30 August, 2025; originally announced September 2025.

    Comments: 5 pages, 3 figures

  6. arXiv:2508.18907  [pdf, ps, other

    cs.SD cs.AI

    SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

    Authors: Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See

    Abstract: Anonymization of voice seeks to conceal the identity of the speaker while maintaining the utility of speech data. However, residual speaker cues often persist, which pose privacy risks. We propose SegReConcat, a data augmentation method for attacker-side enhancement of automatic speaker verification systems. SegReConcat segments anonymized speech at the word level, rearranges segments using random… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: The Paper has been accepted by APCIPA ASC 2025

  7. arXiv:2508.17179  [pdf, ps, other

    cs.IT

    Polarization-Aware DoA Detection Relying on a Single Rydberg Atomic Receiver

    Authors: Yuanbin Chen, Chau Yuen, Darmindra Arumugam, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: A polarization-aware direction-of-arrival (DoA) detection scheme is conceived that leverages the intrinsic vector sensitivity of a single Rydberg atomic vapor cell to achieve quantum-enhanced angle resolution. Our core idea lies in the fact that the vector nature of an electromagnetic wave is uniquely determined by its orthogonal electric and magnetic field components, both of which can be retriev… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

    Comments: This manuscript has been submitted to IEEE journal for publication, 13 pages, 12 figures

  8. Align 3D Representation and Text Embedding for 3D Content Personalization

    Authors: Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Recent advances in NeRF and 3DGS have significantly enhanced the efficiency and quality of 3D content synthesis. However, efficient personalization of generated 3D content remains a critical challenge. Current 3D personalization approaches predominantly rely on knowledge distillation-based methods, which require computationally expensive retraining procedures. To address this challenge, we propose… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

  9. arXiv:2507.14921  [pdf, ps, other

    cs.CV

    Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction

    Authors: Xiufeng Huang, Ka Chun Cheung, Runmin Cong, Simon See, Renjie Wan

    Abstract: Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address thi… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: ACMMM2025. Non-camera-ready version

  10. arXiv:2507.10999  [pdf, ps, other

    cs.CV cs.AI

    SpaRTAN: Spatial Reinforcement Token-based Aggregation Network for Visual Recognition

    Authors: Quan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo, KokSheik Wong, Simon See

    Abstract: The resurgence of convolutional neural networks (CNNs) in visual recognition tasks, exemplified by ConvNeXt, has demonstrated their capability to rival transformer-based architectures through advanced training methodologies and ViT-inspired design principles. However, both CNNs and transformers exhibit a simplicity bias, favoring straightforward features over complex structural representations. Fu… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: Accepted at International Joint Conference on Neural Networks (IJCNN 2025)

  11. arXiv:2507.10977  [pdf, ps, other

    cs.CV cs.AI

    Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection

    Authors: Quan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo, KokSheik Wong, Simon See

    Abstract: Human-object interaction (HOI) detection is essential for accurately localizing and characterizing interactions between humans and objects, providing a comprehensive understanding of complex visual scenes across various domains. However, existing HOI detectors often struggle to deliver reliable predictions efficiently, relying on resource-intensive training methods and inefficient architectures. T… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: Accepted at International Joint Conference on Neural Networks (IJCNN 2025)

  12. arXiv:2506.02461  [pdf, ps, other

    cs.CL

    XToM: Exploring the Multilingual Theory of Mind for Large Language Models

    Authors: Chunkit Chan, Yauwai Yim, Hongchuan Zeng, Zhiying Zou, Xinyuan Cheng, Zhifan Sun, Zheye Deng, Kawai Chung, Yuzhuo Ao, Yixiang Fan, Cheng Jiayang, Ercong Nie, Ginny Y. Wong, Helmut Schmid, Hinrich Schütze, Simon See, Yangqiu Song

    Abstract: Theory of Mind (ToM), the ability to infer mental states in others, is pivotal for human social cognition. Existing evaluations of ToM in LLMs are largely limited to English, neglecting the linguistic diversity that shapes human cognition. This limitation raises a critical question: can LLMs exhibit Multilingual Theory of Mind, which is the capacity to reason about mental states across diverse lin… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2506.01355  [pdf, ps, other

    eess.SP cs.IT quant-ph

    Rydberg Atomic Quantum MIMO Receivers for The Multi-User Uplink

    Authors: Tierui Gong, Chau Yuen, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution for evolving wireless receivers from the classical to the quantum domain. To further unleash their great potential in wireless communications, we propose a flexible architecture for Rydberg atomic quantum multiple-input multiple-output (RAQ-MIMO) receivers in the multi-user uplink. Then the corresponding signal model of… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 13 pages, 5 figures, 1 table

  14. arXiv:2505.10610  [pdf, ps, other

    cs.CV cs.CL

    MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

    Authors: Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman

    Abstract: The rapid extension of context windows in large vision-language models has given rise to long-context vision-language models (LCVLMs), which are capable of handling hundreds of images with interleaved text tokens in a single forward pass. In this work, we introduce MMLongBench, the first benchmark covering a diverse set of long-context vision-language tasks, to evaluate LCVLMs effectively and thor… ▽ More

    Submitted 6 October, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted as a spotlight at NeurIPS 2025

  15. arXiv:2504.13603  [pdf, other

    cs.CL

    Continual Pre-Training is (not) What You Need in Domain Adaption

    Authors: Pin-Er Chen, Da-Chen Lian, Shu-Kai Hsieh, Sieh-Chuen Huang, Hsuan-Lei Shao, Jun-Wei Chiu, Yang-Hsien Lin, Zih-Ching Chen, Cheng-Kuang, Eddie TC Huang, Simon See

    Abstract: The recent advances in Legal Large Language Models (LLMs) have transformed the landscape of legal research and practice by automating tasks, enhancing research precision, and supporting complex decision-making processes. However, effectively adapting LLMs to the legal domain remains challenging due to the complexity of legal reasoning, the need for precise interpretation of specialized language, a… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures

  16. arXiv:2504.05081  [pdf, ps, other

    cs.CL

    The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

    Authors: Tianshi Zheng, Yixiang Chen, Chengxi Li, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs). However, our study reveals a surprising contradiction to this prevailing perspective within the fundamental domain of pattern-based in-context learning (ICL). Through extensive experiments involving 16 state-of-the-art LLMs and nine diverse pattern-based IC… ▽ More

    Submitted 1 November, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted by TMLR

  17. arXiv:2504.04982  [pdf, other

    cs.AI cs.DC

    Transforming Future Data Center Operations and Management via Physical AI

    Authors: Zhiwei Cao, Minghao Li, Feng Lin, Jimin Jia, Yonggang Wen, Jianxiong Yin, Simon See

    Abstract: Data centers (DCs) as mission-critical infrastructures are pivotal in powering the growth of artificial intelligence (AI) and the digital economy. The evolution from Internet DC to AI DC has introduced new challenges in operating and managing data centers for improved business resilience and reduced total cost of ownership. As a result, new paradigms, beyond the traditional approaches based on bes… ▽ More

    Submitted 15 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: 9 pages, 5 figures

  18. arXiv:2503.18100  [pdf, other

    cs.CV

    M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving

    Authors: Xuesong Chen, Shaoshuai Shi, Tao Ma, Jingqiu Zhou, Simon See, Ka Chun Cheung, Hongsheng Li

    Abstract: The perception system for autonomous driving generally requires to handle multiple diverse sub-tasks. However, current algorithms typically tackle individual sub-tasks separately, which leads to low efficiency when aiming at obtaining full-perception results. Some multi-task learning methods try to unify multiple tasks with one model, but do not solve the conflicts in multi-task learning. In this… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  19. arXiv:2503.08997  [pdf, ps, other

    cs.RO cs.LG

    Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds

    Authors: Dikai Liu, Tianwei Zhang, Jianxiong Yin, Simon See

    Abstract: Quadrupeds have gained rapid advancement in their capability of traversing across complex terrains. The adoption of deep Reinforcement Learning (RL), transformers and various knowledge transfer techniques can greatly reduce the sim-to-real gap. However, the classical teacher-student framework commonly used in existing locomotion policies requires a pre-trained teacher and leverages the privilege i… ▽ More

    Submitted 3 August, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted for IROS 2025. Project website for video: https://johnliudk.github.io/ult/

  20. arXiv:2503.08163  [pdf, other

    cs.LG cs.AI cs.CE

    XAI4Extremes: An interpretable machine learning framework for understanding extreme-weather precursors under climate change

    Authors: Jiawen Wei, Aniruddha Bora, Vivek Oommen, Chenyu Dong, Juntao Yang, Jeff Adie, Chen Chen, Simon See, George Karniadakis, Gianmarco Mengaldo

    Abstract: Extreme weather events are increasing in frequency and intensity due to climate change. This, in turn, is exacting a significant toll in communities worldwide. While prediction skills are increasing with advances in numerical weather prediction and artificial intelligence tools, extreme weather still present challenges. More specifically, identifying the precursors of such extreme weather events a… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  21. arXiv:2502.13185  [pdf, other

    physics.ao-ph cs.AI cs.LG

    CondensNet: Enabling stable long-term climate simulations via hybrid deep learning models with adaptive physical constraints

    Authors: Xin Wang, Juntao Yang, Jeff Adie, Simon See, Kalli Furtado, Chen Chen, Troy Arcomano, Romit Maulik, Gianmarco Mengaldo

    Abstract: Accurate and efficient climate simulations are crucial for understanding Earth's evolving climate. However, current general circulation models (GCMs) face challenges in capturing unresolved physical processes, such as cloud and convection. A common solution is to adopt cloud resolving models, that provide more accurate results than the standard subgrid parametrisation schemes typically used in GCM… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  22. arXiv:2502.11176  [pdf, ps, other

    cs.CL

    LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning

    Authors: Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Modern large language models (LLMs) employ diverse logical inference mechanisms for reasoning, making the strategic optimization of these approaches critical for advancing their capabilities. This paper systematically investigate the comparative dynamics of inductive (System 1) versus abductive/deductive (System 2) inference in LLMs. We utilize a controlled analogical reasoning environment, varyin… ▽ More

    Submitted 17 September, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: EMNLP 2025 Main

  23. arXiv:2502.09143  [pdf, other

    cs.CV cs.LG

    Feature-based Graph Attention Networks Improve Online Continual Learning

    Authors: Adjovi Sim, Zhengkui Wang, Aik Beng Ng, Shalini De Mello, Simon See, Wonmin Byeon

    Abstract: Online continual learning for image classification is crucial for models to adapt to new data while retaining knowledge of previously learned tasks. This capability is essential to address real-world challenges involving dynamic environments and evolving data distributions. Traditional approaches predominantly employ Convolutional Neural Networks, which are limited to processing images as grids an… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 16 pages

  24. arXiv:2501.18382  [pdf, other

    eess.SP cs.IT

    Rydberg Atomic Quantum Receivers for the Multi-User MIMO Uplink

    Authors: Tierui Gong, Chau Yuen, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: Rydberg atomic quantum receivers exhibit great potential in assisting classical wireless communications due to their outstanding advantages in detecting radio frequency signals. To realize this potential, we integrate a Rydberg atomic quantum receiver into a classical multi-user multiple-input multiple-output (MIMO) scheme to form a multi-user Rydberg atomic quantum MIMO (RAQ-MIMO) system for the… ▽ More

    Submitted 28 February, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: 6 pages, 4 figures, accepted by 2025 IEEE International Conference on Communications (ICC 2025)

  25. arXiv:2501.11842  [pdf, ps, other

    cs.IT eess.SP

    Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications

    Authors: Yuanbin Chen, Xufeng Guo, Chau Yuen, Yufei Zhao, Yong Liang Guan, Chong Meng Samson See, Merouane Débbah, Lajos Hanzo

    Abstract: The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver.… ▽ More

    Submitted 30 July, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

    Comments: This revised manuscript has been submitted to IEEE journal, 16 pages, 10 figures

  26. arXiv:2501.02820  [pdf, ps, other

    eess.SP cs.IT quant-ph

    Rydberg Atomic Quantum Receivers for Multi-Target DOA Estimation

    Authors: Tierui Gong, Chau Yuen, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: Quantum sensing technologies have experienced rapid progresses since entering the `second quantum revolution'. Among various candidates, schemes relying on Rydberg atoms exhibit compelling advantages for detecting radio frequency signals. Based on this, Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution to classical wireless communication and sensing. To harness the adva… ▽ More

    Submitted 11 December, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: 6 pages, 7 figures, accepted by IEEE Transactions on Vehicular Technology

  27. arXiv:2412.15503  [pdf, other

    cs.CR

    Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers

    Authors: Ruofei Wang, Hongzhan Lin, Ziyuan Luo, Ka Chun Cheung, Simon See, Jing Ma, Renjie Wan

    Abstract: Hateful meme detection aims to prevent the proliferation of hateful memes on various social media platforms. Considering its impact on social environments, this paper introduces a previously ignored but significant threat to hateful meme detection: backdoor attacks. By injecting specific triggers into meme samples, backdoor attackers can manipulate the detector to output their desired outcomes. To… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI25

  28. arXiv:2412.09126  [pdf, other

    cs.MM cs.AI cs.LG

    Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning

    Authors: Meng Shen, Yake Wei, Jianxiong Yin, Deepu Rajan, Di Hu, Simon See

    Abstract: Training multimodal models requires a large amount of labeled data. Active learning (AL) aim to reduce labeling costs. Most AL methods employ warm-start approaches, which rely on sufficient labeled data to train a well-calibrated model that can assess the uncertainty and diversity of unlabeled data. However, when assembling a dataset, labeled data are often scarce initially, leading to a cold-star… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: 11 pages, ACMMM Asia 2024, Oral Presentation

  29. arXiv:2412.05554  [pdf, other

    eess.SP cs.IT quant-ph

    Rydberg Atomic Quantum Receivers for Classical Wireless Communications and Sensing: Their Models and Performance

    Authors: Tierui Gong, Jiaming Sun, Chau Yuen, Guangwei Hu, Yufei Zhao, Yong Liang Guan, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: The significant progress of quantum sensing technologies offer numerous radical solutions for measuring a multitude of physical quantities at an unprecedented precision. Among them, Rydberg atomic quantum receivers (RAQRs) emerge as an eminent solution for detecting the electric field of radio frequency (RF) signals, exhibiting great potential in assisting classical wireless communications and sen… ▽ More

    Submitted 13 May, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

    Comments: 16 pages, 8 figures

  30. arXiv:2410.23718  [pdf, other

    cs.CV

    GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting

    Authors: Xiufeng Huang, Ruiqi Li, Yiu-ming Cheung, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: 3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets. To protect the copyright of these assets, digital watermarking techniques can be applied to embed ownership information discreetly within 3DGS models. However, existing watermarking methods for meshes, point clouds, and implicit radiance fields cannot be directly applied to 3DGS models, as 3DGS models use explicit 3D… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  31. arXiv:2410.22705  [pdf, other

    cs.CV

    Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images

    Authors: Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Single-view 3D reconstruction methods like Triplane Gaussian Splatting (TGS) have enabled high-quality 3D model generation from just a single image input within seconds. However, this capability raises concerns about potential misuse, where malicious users could exploit TGS to create unauthorized 3D models from copyrighted images. To prevent such infringement, we propose a novel image protection a… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024

  32. arXiv:2410.16070  [pdf, other

    cs.AI cs.CL

    On-Device LLMs for SMEs: Challenges and Opportunities

    Authors: Jeremy Stephen Gabriel Yee, Pai Chet Ng, Zhengkui Wang, Ian McLoughlin, Aik Beng Ng, Simon See

    Abstract: This paper presents a systematic review of the infrastructure requirements for deploying Large Language Models (LLMs) on-device within the context of small and medium-sized enterprises (SMEs), focusing on both hardware and software perspectives. From the hardware viewpoint, we discuss the utilization of processing units like GPUs and TPUs, efficient memory and storage solutions, and strategies for… ▽ More

    Submitted 22 October, 2024; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: 9 pages, 1 figure. The work is supported by the SIT-NVIDIA Joint AI Centre

    MSC Class: 68T07 ACM Class: I.2

  33. arXiv:2410.15038  [pdf, other

    cs.CV cs.AI

    A Multimodal Vision Foundation Model for Clinical Dermatology

    Authors: Siyuan Yan, Zhen Yu, Clare Primiero, Cristina Vico-Alonso, Zhonghua Wang, Litao Yang, Philipp Tschandl, Ming Hu, Lie Ju, Gin Tan, Vincent Tang, Aik Beng Ng, David Powell, Paul Bonnington, Simon See, Elisabetta Magnaterra, Peter Ferguson, Jennifer Nguyen, Pascale Guitera, Jose Banuls, Monika Janda, Victoria Mar, Harald Kittler, H. Peter Soyer, Zongyuan Ge

    Abstract: Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermato… ▽ More

    Submitted 13 April, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

    Comments: 74 pages; Preprint; The code can be found at https://github.com/SiyuanYan1/PanDerm

  34. arXiv:2410.04239  [pdf, other

    cs.CL

    Persona Knowledge-Aligned Prompt Tuning Method for Online Debate

    Authors: Chunkit Chan, Cheng Jiayang, Xin Liu, Yauwai Yim, Yuxin Jiang, Zheye Deng, Haoran Li, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Debate is the process of exchanging viewpoints or convincing others on a particular issue. Recent research has provided empirical evidence that the persuasiveness of an argument is determined not only by language usage but also by communicator characteristics. Researchers have paid much attention to aspects of languages, such as linguistic features and discourse structures, but combining argument… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: Accepted to ECAI 2024

  35. arXiv:2409.14501  [pdf, other

    eess.SP cs.IT quant-ph

    Rydberg Atomic Quantum Receivers for Classical Wireless Communication and Sensing

    Authors: Tierui Gong, Aveek Chandra, Chau Yuen, Yong Liang Guan, Rainer Dumke, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo

    Abstract: The Rydberg atomic quantum receivers (RAQR) are emerging quantum precision sensing platforms designed for receiving radio frequency (RF) signals. It relies on creation of Rydberg atoms from normal atoms by exciting one or more electrons to a very high energy level, thereby making the atom sensitive to RF signals. RAQRs realize RF-to-optical conversions based on light-atom interactions relying on t… ▽ More

    Submitted 18 January, 2025; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: 9 pages, 5 figures, 1 table

    Report number: IEEE Wireless Communications, 2025, Vol.32(5), p.90-100

    Journal ref: IEEE Wireless Communications, 2025, Vol.32(5), p.90-100

  36. arXiv:2409.03332  [pdf, other

    cs.RO

    Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion

    Authors: Dikai Liu, Tianwei Zhang, Jianxiong Yin, Simon See

    Abstract: With the rising focus on quadrupeds, a generalized policy capable of handling different robot models and sensor inputs becomes highly beneficial. Although several methods have been proposed to address different morphologies, it remains a challenge for learning-based policies to manage various combinations of proprioceptive information. This paper presents Masked Sensory-Temporal Attention (MSTA),… ▽ More

    Submitted 11 March, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted for ICRA 2025. Project website for video: https://johnliudk.github.io/msta/

  37. arXiv:2407.13390  [pdf, other

    cs.CV

    GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields

    Authors: Xiufeng Huang, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Remarkable advancements in the recolorization of Neural Radiance Fields (NeRF) have simplified the process of modifying NeRF's color attributes. Yet, with the potential of NeRF to serve as shareable digital assets, there's a concern that malicious users might alter the color of NeRF models and falsely claim the recolorized version as their own. To safeguard against such breaches of ownership, enab… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  38. arXiv:2407.10510  [pdf, other

    cs.CL cs.AI cs.CE

    TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction

    Authors: Xingzhi Zhou, Xin Dong, Chunhao Li, Yuning Bai, Yulong Xu, Ka Chun Cheung, Simon See, Xinpeng Song, Runshun Zhang, Xuezhong Zhou, Nevin L. Zhang

    Abstract: Traditional Chinese medicine (TCM) has relied on specific combinations of herbs in prescriptions to treat various symptoms and signs for thousands of years. Predicting TCM prescriptions poses a fascinating technical challenge with significant practical implications. However, this task faces limitations due to the scarcity of high-quality clinical datasets and the complex relationship between sympt… ▽ More

    Submitted 12 December, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Camera-ready version to be published in BIBM 2024

  39. arXiv:2407.07735  [pdf, other

    cs.CV

    Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model

    Authors: Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation. With the rising prominence and influence of NeRF, safeguarding its intellectual property has become increasingly important. In this paper, we propose \textbf{NeRFProtector}, which adopts a plug-and-play strategy to protect NeRF's copyright during its creation. NeRFProtector utilizes a pre-trained watermarking base… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  40. arXiv:2406.01938  [pdf, other

    cs.CV cs.MM

    Nutrition Estimation for Dietary Management: A Transformer Approach with Depth Sensing

    Authors: Zhengyi Kwan, Wei Zhang, Zhengkui Wang, Aik Beng Ng, Simon See

    Abstract: Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder an… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  41. arXiv:2405.13629  [pdf, other

    cs.LG

    Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

    Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

    Abstract: Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance wit… ▽ More

    Submitted 26 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Published at NeurIPS 2024. Code: https://github.com/ChienFeng-hub/meow

  42. arXiv:2405.02630  [pdf, other

    quant-ph cs.DC cs.SE

    Validating Large-Scale Quantum Machine Learning: Efficient Simulation of Quantum Support Vector Machines Using Tensor Networks

    Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

    Abstract: We present an efficient tensor-network-based approach for simulating large-scale quantum circuits, demonstrated using Quantum Support Vector Machines (QSVMs). Our method effectively reduces exponential runtime growth to near-quadratic scaling with respect to the number of qubits in practical scenarios. Traditional state-vector simulations become computationally infeasible beyond approximately 50 q… ▽ More

    Submitted 6 January, 2025; v1 submitted 4 May, 2024; originally announced May 2024.

  43. arXiv:2402.10646  [pdf, other

    cs.CL

    AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation

    Authors: Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs' abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LL… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  44. arXiv:2401.15977  [pdf, other

    cs.CV

    Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

    Authors: Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

    Abstract: We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the ref… ▽ More

    Submitted 31 January, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Project page: https://xiaoyushi97.github.io/Motion-I2V/

  45. arXiv:2401.14619  [pdf, other

    cs.LG

    Resilient Practical Test-Time Adaptation: Soft Batch Normalization Alignment and Entropy-driven Memory Bank

    Authors: Xingzhi Zhou, Zhiliang Tian, Ka Chun Cheung, Simon See, Nevin L. Zhang

    Abstract: Test-time domain adaptation effectively adjusts the source domain model to accommodate unseen domain shifts in a target domain during inference. However, the model performance can be significantly impaired by continuous distribution changes in the target domain and non-independent and identically distributed (non-i.i.d.) test samples often encountered in practical scenarios. While existing memory… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  46. arXiv:2310.05210  [pdf, other

    cs.AI cs.CL

    TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

    Authors: Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argumen… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to the 10th Workshop on Argument Mining, co-located with EMNLP 2023

  47. arXiv:2309.08303  [pdf, other

    cs.CL

    Self-Consistent Narrative Prompts on Abductive Natural Language Inference

    Authors: Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Wong, Simon See

    Abstract: Abduction has long been seen as crucial for narrative comprehension and reasoning about everyday situations. The abductive natural language inference ($α$NLI) task has been proposed, and this narrative text-based task aims to infer the most plausible hypothesis from the candidates given two observations. However, the inter-sentential coherence and the model consistency have not been well exploited… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at IJCNLP-AACL 2023 main track

  48. arXiv:2308.05396  [pdf, other

    cs.CV

    Learning Gabor Texture Features for Fine-Grained Recognition

    Authors: Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu

    Abstract: Extracting and using class-discriminative features is critical for fine-grained recognition. Existing works have demonstrated the possibility of applying deep CNNs to exploit features that distinguish similar classes. However, CNNs suffer from problems including frequency bias and loss of detailed local information, which restricts the performance of recognizing fine-grained categories. To address… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV2023

  49. Towards Building AI-CPS with NVIDIA Isaac Sim: An Industrial Benchmark and Case Study for Robotics Manipulation

    Authors: Zhehua Zhou, Jiayang Song, Xuan Xie, Zhan Shu, Lei Ma, Dikai Liu, Jianxiong Yin, Simon See

    Abstract: As a representative cyber-physical system (CPS), robotic manipulator has been widely adopted in various academic research and industrial processes, indicating its potential to act as a universal interface between the cyber and the physical worlds. Recent studies in robotics manipulation have started employing artificial intelligence (AI) approaches as controllers to achieve better adaptability and… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  50. arXiv:2307.11526  [pdf, other

    cs.CV

    CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields

    Authors: Ziyuan Luo, Qing Guo, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with… ▽ More

    Submitted 29 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 11 pages, 6 figures, accepted by ICCV 2023 non-camera-ready version