Skip to main content

Showing 1–4 of 4 results for author: Cannons, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2601.00501  [pdf, ps, other

    cs.CV

    CPPO: Contrastive Perception for Vision Language Policy Optimization

    Authors: Ahmad Rezaei, Mohsen Gholami, Saeed Ranjbar Alvar, Kevin Cannons, Mohammad Asiful Hossain, Zhou Weimin, Shunbo Zhou, Yong Zhang, Mohammad Akbari

    Abstract: We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tok… ▽ More

    Submitted 1 January, 2026; originally announced January 2026.

  2. arXiv:2512.05277  [pdf, ps, other

    cs.CV cs.AI

    From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

    Authors: Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidarikhazaei, Zhou Weimin, Yong Zhang, Mohammad Akbari

    Abstract: Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets and benchmarks aimed at improving temporal reasoning, but these have emphasized other video content, including sports, cooking, and movies. No existing benchmark focuses exclusively on the unique challenges of t… ▽ More

    Submitted 16 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  3. arXiv:2503.05936  [pdf, other

    cs.CV

    CASP: Compression of Large Multimodal Models Based on Attention Sparsity

    Authors: Mohsen Gholami, Mohammad Akbari, Kevin Cannons, Yong Zhang

    Abstract: In this work, we propose an extreme compression technique for Large Multimodal Models (LMMs). While previous studies have explored quantization as an efficient post-training compression method for Large Language Models (LLMs), low-bit compression for multimodal models remains under-explored. The redundant nature of inputs in multimodal models results in a highly sparse attention matrix. We theoret… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  4. arXiv:2110.14613  [pdf, other

    cs.CV cs.AI

    International Workshop on Continual Semi-Supervised Learning: Introduction, Benchmarks and Baselines

    Authors: Ajmal Shahbaz, Salman Khan, Mohammad Asiful Hossain, Vincenzo Lomonaco, Kevin Cannons, Zhan Xu, Fabio Cuzzolin

    Abstract: The aim of this paper is to formalize a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community via the IJCAI 2021 International Workshop on Continual Semi-Supervised Learning (CSSL-IJCAI), with the aim of raising field awareness about this problem and mobilizing its effort in this direction. After a formal definition of continual semi-su… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.