[MICCAI 2025] CVS-AdaptNet: Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
Authors: Britty Baby, Vinkle Srivastav, Pooja P. Jain, Kun Yuan, Pietro Mascagni, Nicolas Padoy
We introduce CVS-AdaptNet, a novel adaptation framework that leverages multi-modal foundation models for fine-grained, multi-label surgical recognition. Specifically, CVS-AdaptNet enhances image-text alignment by incorporating naturally available textual descriptions of the Critical View of Safety (CVS) criteria, without requiring spatial annotations (e.g., bounding boxes or segmentation masks).
If you find this work useful, please cite:
@article{cvsadaptnet,
title={Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition},
author={Baby, Britty and Srivastav, Vinkle and Jain, Pooja P. and Yuan, Kun and Mascagni, Pietro and Padoy, Nicolas},
journal={arXiv preprint arXiv:2507.05007},
year={2025}
}# Create environment
conda create -n cvs_adaptnet python=3.10
conda activate cvs_adaptnet
# Install dependencies
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/CAMMA-public/SurgVLP.gitManually download the following pre-trained weights and store them in the weights/ folder:
If you only want to use the trained CVS-AdaptNet weights (without training from scratch), you can download them directly from the following link:
- CVS-AdaptNet: Download
(Place the downloaded weights in the weights/ folder.)
- Download the Endoscapes dataset from here and place it in the
data/folder. - Run the preprocessing script:
python convert_json_to_csv.pyThis will generate the CSV files required for the dataloader.
To train CVS-AdaptNet, run:
bash train.shThe trained weights will be stored in the folder corresponding to your chosen job_name.
To evaluate a trained model, run:
bash test.sh@article{yuan2024procedure,
title={Procedure-aware surgical video-language pretraining with hierarchical knowledge augmentation},
author={Yuan, Kun and Srivastav, Vinkle and Navab, Nassir and Padoy, Nicolas},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={122952--122983},
year={2024}
}This repository is released under the CC BY-NC-SA 4.0 license.
By downloading and using this code, you agree to the terms specified in the LICENSE
This repository is adapted from SurgVLP.
We thank the authors of SurgVLP for making their code available to the community.
This work has received funding from the European Union (ERC, CompSURG, 101088553).
The views expressed are those of the authors and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
This work was also partially supported by French state funds managed by the ANR under grants ANR-20-CHIA-0029-01 and ANR-10-IAHU-02.