[MICCAI 2025] CVS-AdaptNet: Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition

Authors: Britty Baby, Vinkle Srivastav, Pooja P. Jain, Kun Yuan, Pietro Mascagni, Nicolas Padoy

Overview

We introduce CVS-AdaptNet, a novel adaptation framework that leverages multi-modal foundation models for fine-grained, multi-label surgical recognition. Specifically, CVS-AdaptNet enhances image-text alignment by incorporating naturally available textual descriptions of the Critical View of Safety (CVS) criteria, without requiring spatial annotations (e.g., bounding boxes or segmentation masks).

🔗 Reference

If you find this work useful, please cite:

@article{cvsadaptnet,
  title={Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition},
  author={Baby, Britty and Srivastav, Vinkle and Jain, Pooja P. and Yuan, Kun and Mascagni, Pietro and Padoy, Nicolas},
  journal={arXiv preprint arXiv:2507.05007},
  year={2025}
}

Installation

# Create environment
conda create -n cvs_adaptnet python=3.10
conda activate cvs_adaptnet

# Install dependencies
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/CAMMA-public/SurgVLP.git

Pre-trained Models

Manually download the following pre-trained weights and store them in the weights/ folder:

PeskaVLP: Download
ResNet-50: Download

Pre-trained CVS-AdaptNet Weights

If you only want to use the trained CVS-AdaptNet weights (without training from scratch), you can download them directly from the following link:

CVS-AdaptNet: Download

(Place the downloaded weights in the weights/ folder.)

Dataset Preparation

Download the Endoscapes dataset from here and place it in the data/ folder.
Run the preprocessing script:

python convert_json_to_csv.py

This will generate the CSV files required for the dataloader.

Training

To train CVS-AdaptNet, run:

bash train.sh

The trained weights will be stored in the folder corresponding to your chosen job_name.

Testing

To evaluate a trained model, run:

bash test.sh

Related Work

@article{yuan2024procedure,
  title={Procedure-aware surgical video-language pretraining with hierarchical knowledge augmentation},
  author={Yuan, Kun and Srivastav, Vinkle and Navab, Nassir and Padoy, Nicolas},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={122952--122983},
  year={2024}
}

License

This repository is released under the CC BY-NC-SA 4.0 license.
By downloading and using this code, you agree to the terms specified in the LICENSE

⚠️ Note: Third-party libraries and models are subject to their respective licenses.

This repository is adapted from SurgVLP.
We thank the authors of SurgVLP for making their code available to the community.

Acknowledgements

This work has received funding from the European Union (ERC, CompSURG, 101088553).
The views expressed are those of the authors and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

This work was also partially supported by French state funds managed by the ANR under grants ANR-20-CHIA-0029-01 and ANR-10-IAHU-02.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
class_prompts		class_prompts
codes		codes
configs		configs
data_preprocess		data_preprocess
figs		figs
utilities		utilities
LICENSE.txt		LICENSE.txt
README.md		README.md
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[MICCAI 2025] CVS-AdaptNet: Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition

Overview

🔗 Reference

Installation

Pre-trained Models

Pre-trained CVS-AdaptNet Weights

Dataset Preparation

Training

Testing

Related Work

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

CAMMA-public/CVS-AdaptNet

Folders and files

Latest commit

History

Repository files navigation

[MICCAI 2025] CVS-AdaptNet: Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition

Overview

🔗 Reference

Installation

Pre-trained Models

Pre-trained CVS-AdaptNet Weights

Dataset Preparation

Training

Testing

Related Work

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages