Skip to main content

Showing 1–11 of 11 results for author: Li, J C L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.11438  [pdf, ps, other

    cs.CV

    VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models

    Authors: Mingjie Xu, Jinpeng Chen, Yuzhi Zhao, Jason Chun Lok Li, Yue Qiu, Zekang Du, Mengyang Wu, Pingping Zhang, Kun Li, Hongzheng Yang, Wenao Ma, Jiaheng Wei, Qinbin Li, Kangcheng Liu, Wenqiang Lei

    Abstract: Multimodal large language models (MLLMs) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image, human users naturally use "visual prompts" (VPs), such as bounding boxes, to provide reference. However, no existing benchmark systematically evaluates the ability… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: This is the extended version of the paper accepted at AAAI 2026, which includes all technical appendices and additional experimental details

  2. arXiv:2511.07738  [pdf, ps, other

    cs.LG cs.CV

    From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

    Authors: Donglai Xu, Hongzheng Yang, Yuzhi Zhao, Pingping Zhang, Jinpeng Chen, Wenao Ma, Zhijian Hou, Mengyang Wu, Xiaolei Li, Senkang Hu, Ziyi Guan, Jason Chun Lok Li, Lai Man Po

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) for Multimodal Large Language Models (MLLMs) is highly dependent on high-quality labeled data, which is often scarce and prone to substantial annotation noise in real-world scenarios. Existing unsupervised RLVR methods, including pure entropy minimization, can overfit to incorrect labels and limit the crucial reward ranking signal for Group-Rel… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  3. arXiv:2509.00366  [pdf, ps, other

    cs.MA cs.CL cs.MM

    KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation

    Authors: Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen, Thanh-Toan Nguyen, Pengfei Xian, Wenao Ma, Shengchao Qin, Graziano Chesi, Ngai Wong

    Abstract: Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Gener… ▽ More

    Submitted 30 August, 2025; originally announced September 2025.

    Comments: Accepted by the EMNLP 2025

  4. arXiv:2412.06322  [pdf, other

    cs.CV

    LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations

    Authors: Mingjie Xu, Mengyang Wu, Yuzhi Zhao, Jason Chun Lok Li, Weifeng Ou

    Abstract: Scene Graph Generation (SGG) converts visual scenes into structured graph representations, providing deeper scene understanding for complex vision tasks. However, existing SGG models often overlook essential spatial relationships and struggle with generalization in open-vocabulary contexts. To address these limitations, we propose LLaVA-SpaceSGG, a multimodal large language model (MLLM) designed f… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted by the WACV 2025, including supplementary material

  5. arXiv:2405.12398  [pdf, other

    cs.LG

    ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference

    Authors: Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong

    Abstract: Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 (v3: 21 pages, 11 figures, Project Page: https://github.com/stevolopolis/asmr.git)

  6. arXiv:2405.10531  [pdf, other

    cs.LG cs.CV

    Nonparametric Teaching of Implicit Neural Representations

    Authors: Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

    Abstract: We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of I… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: ICML 2024 (24 pages, 13 figures)

  7. arXiv:2312.17018  [pdf, other

    cs.CV cs.LG

    Learning Spatially Collaged Fourier Bases for Implicit Neural Representation

    Authors: Jason Chun Lok Li, Chang Liu, Binxiao Huang, Ngai Wong

    Abstract: Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 11 pages, 13 figures, Accepted at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)

  8. arXiv:2312.09922  [pdf, other

    cs.CV cs.AI

    A Unifying Tensor View for Lightweight CNNs

    Authors: Jason Chun Lok Li, Rui Lin, Jiajun Zhou, Edmund Yin Mun Lam, Ngai Wong

    Abstract: Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approxim… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, accepted in 2023 IEEE 15th International Conference on ASIC (ASICON 2023)

  9. arXiv:2312.06101  [pdf, other

    eess.IV cs.CV

    Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

    Authors: Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

    Abstract: Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup ta… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  10. arXiv:2311.08125  [pdf, other

    cs.LG

    Lite it fly: An All-Deformable-Butterfly Network

    Authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong

    Abstract: Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compr… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 7 pages, 3 figures, accepted as a brief paper in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  11. arXiv:2208.13571  [pdf, other

    cs.LG cs.AI

    PECAN: A Product-Quantized Content Addressable Memory Network

    Authors: Jie Ran, Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Ngai Wong

    Abstract: A novel deep neural network (DNN) architecture is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ). This results in a natural implementation via content addressable memory (CAM), which transcends regular DNN layer operations and requires only simple table lookup. Two schemes are developed for the end-to-end PQ prototype training, namely, throug… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.