Computer Science > Computer Vision and Pattern Recognition

arXiv:2509.26036 (cs)

[Submitted on 30 Sep 2025 (v1), last revised 9 Apr 2026 (this version, v3)]

Title:SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

Authors:Christoph Timmermann, Hyunse Lee, Woojin Lee

Abstract:While Contrastive Language-Image Pretraining (CLIP) excels at zero-shot tasks by aligning image and text embeddings, its performance in few-shot classification is hindered by a critical limitation: intra-modal misalignment. This issue, caused by a persistent modality gap and CLIP's exclusively inter-modal training objective, leaves the embedding spaces uncalibrated, making direct image-to-image comparisons unreliable. Existing methods attempt to address this by refining similarity logits or by computationally expensive per-sample optimization. To overcome these challenges, we introduce SeMoBridge, a lightweight yet powerful approach that directly addresses the misalignment. Our method maps images into the text modality, while keeping their semantic content intact through what we call a Semantic Modality Bridge. SeMoBridge is closed-form and can optionally be trained through multi-modal supervision, combining image and text-alignment losses to optimize the projection. Experiments show that the trained version, SeMoBridge-T, requires only a fraction of the training time while overall outperforming other methods, particularly in low-data scenarios (1, 2, and 4 shots). The code is available at this https URL.

Comments:	22 pages, 12 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2509.26036 [cs.CV]
	(or arXiv:2509.26036v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2509.26036

Submission history

From: Christoph Timmermann [view email]
[v1] Tue, 30 Sep 2025 10:12:15 UTC (1,867 KB)
[v2] Wed, 1 Oct 2025 09:18:57 UTC (1,867 KB)
[v3] Thu, 9 Apr 2026 15:28:40 UTC (2,004 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators