8000
Skip to content

yaohui120/ComprehendEdit

Repository files navigation

ComprehendEdit

About the project(paper included appendix is here)   

We introduce ComprehendEdit, a comprehensive benchmark with enhanced metrics for multimodal knowledge editing. ComprehendEdit incorporates eight diverse tasks derived from multiple datasets, providing a more robust and varied evaluation framework. Two novel evaluation metrics are introduced: Knowledge Generalization Index (KGI) and Knowledge Preservation Index (KPI), which assess the impact of knowledge editing on in-domain samples. The variety in question types of existing datasets (generated by Llama-2-7b-chat-hf) and ComprehendEdit are shown in following table:

Task E-VQA VLKEB ComprehendEdit
Object Recognition 4,854 8,089 2,962
Object Attributes 1,435 27 2,987
Object Counting 1,213 0 2,009
Object Existence 845 3 1,962
Scene Information 45 44 2,854
Numerical Inference 23 0 846
Spatial Relationship 16 1 2,239
Text Recognition 8 0 2,073
Total 8,439 8,164 17,932

ComprehendEdit focus on evaluate the edited model on in-domain samples, as shown in following figure:

Details of Dataset    

Here are some samples of ComprehendEdit:

Q, G, P, S, C mean Question, Ground-truth, Prediction, Source, task Category independently.

The dataset is organized as follows:

|——ComprehendEdit/   
|  |——GQA/
|  |  |——images/
|  |  |  |——21.jpg
|  |  |  |——...
|  |——MathVista/
|  |  |——images/
|  |——TallyQA/
|  |  |——VG_100K/            
|  |  |——VG_100K_2/
|  |——TextVQA/
|  |  |——train_images/
|  |——VSR/
|  |  |——images/
|  |——val2014/
|——ComprehendEdit_train.json          
|——ComprehendEdit_test.json
|——ComprehendEdit_ori_right.json          

The format of each sample in test set is

[{
"image": "GQA/images/2405722.jpg",
"question": "What is this bird called?",
"rephrase": "What is the bird's name?", # for Text-Generality
"answer": "parrot",
"source": "GQA",  
"Category": "object recognition",
"pid": 0,
"img_topk": [...],  # pid of the image topk nearest samples in test set
"txt_topk": [...],  # pid of the text topk nearest samples in test set
"img_last_topk": [...], # pid of the image topk farthest samples in test set
"txt_last_topk": [...], # pid of the text topk farthest samples in test set
"ori_rt_img_topk": [...], # pid of the image topk nearest samples in ComprehendEdit_ori_right.json
"ori_rt_txt_topk": [...], # pid of the text topk nearest samples in ComprehendEdit_ori_right.json
"ori_rt_img_last_topk": [...], # pid of the image topk farthest samples in ComprehendEdit_ori_right.json
"ori_rt_txt_last_topk": [...], # pid of the text topk farthest samples in ComprehendEdit_ori_right.json
"locality_prompt": "when does twice upon a time come out", # for Text-Locality
"locality_ground_truth": "...",
"multimodal_locality_image": "...", # for Multimodal-Locality
"multimodal_locality_prompt": "...",
"multimodal_locality_ground_truth": "..."}, ...]

The details of ComprehendEdit is shown in following table:

Task Train Test Source
Object Recognition 1,471 491 GQA
Object Attributes 2,227 735 GQA
Object Counting 2,282 705 GQA
Object Existence 1,506 503 TallyQA
Scene Information 2,067 787 GQA
Numerical Inference 1,709 530 VSR
Spatial Relationship 1,554 519 TextVQA
Text Recognition 634 212 MathVista
Total 13,450 4,482

The ratio of training data to test data in each task is approximately 3:1, and we also utilize samples from the NQ dataset and OK-VQA dataset to measure text locality (T-L) and multimodal locality (M-L).

This dataset is collected from several benchmarks using BLIP-2 OPT 2.7B and MiniGPT-4 7B. We recommand measuring the changes on top-10 prediction on locality samples before and after editing if you want to run other models on ComprehendEdit. We will update the results in months.

Getting Started

The dataset can be downloaded from baiduyun or google driver. The project is built based on EasyEdit. The class ComprehendEdit is located in ComprehendEdit/easyeditor/dataset/ComprehendEdit.py, and you can import it just like E-VQA.

Usage

The conda environme 6A3F nt is provided in EasyEdit multimodal knowledge editing, and the links of the pretrained model weights are provided in VLKEB.

To run the code, you can use the following command:

sh run_multi.sh # or python3 multimodal_edit_our.py

And you can change the algorithm name in multimodal_edit_our.py to run other models. For example,

train_HICE(model='blip2', train=True)

this code means we will train HICE based on BLIP-2 OPT 2.7B. After training, you can just change train=False to evaluate the model.

Besides, you can also change the hyperparameters yamls in ComprehendEdit/hparams. For example, your can change the ComprehendEdit/hparams/TRAINING/HICE/minigpt4.yaml to decide run the code on different gpus, change the path of pretrained model and so on. In yaml files, gpu_used_id and gpu_split are used to split the model to different gpus.

If you want to run experiments on one gpu, you can set model_parallel=False and gpu_split=[]. If you want to run experiments on other models, you can add the model setting in ComprehendEdit/easyeditor/util/tools.py to support the model. (using device_map="auto" simply may cause out-of-memory on the main gpu if the dataset is too large, running on too many gpus will waste the gpus and need more time.)

Thanks for the framework provided by EasyEdit! The samples in ComprehendEdit come from several datasets: GQA, TallyQA, VSR, TextVQA, MathVista, OKVQA, and NQ dataset. Part of the code references RanPAC. Thanks for these outstanding works!

Please cite our paper if you use ComprehendEdit in your work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0