This is code and checkpoints for the vision-and-language pre-training model in our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?" (Link). CLIP-ViL with pre-training sets new single-model state-of-the-arts on benchmarks such as VQA v2.0 (76.70 on test-std).
The code is adopted from both the CLIP repo and the LXMERT repo. Many thanks to the authors of these repos~
- Download data file with [0dhw] and save them as
data/
file. Now, gpa and mscoco files are incomplete.
-
Download COCO images and unzip them in
data/mscoco
file:wget http://images.cocodataset.org/zips/train2014.zip -P data/mscoco wget http://images.cocodataset.org/zips/val2014.zip -P data/mscoco wget http://images.cocodataset.org/zips/test2015.zip -P data/mscoco unzip data/mscoco/train2014.zip -d data/mscoco/ && rm data/mscoco/train2014.zip unzip data/mscoco/val2014.zip -d data/mscoco/ && rm data/mscoco/val2014.zip unzip data/mscoco/test2015.zip -d data/mscoco/ && rm data/mscoco/test2015.zip
-
Download original GQA dataset, including Scene Graphs (ver 1.1 / 42.7MB), Questions (ver 1.2 / 1.4GB), Images (20.3G), and unzip them in in
data/gpa
file. -
Please refer to
data/shot_for_check.jpg
to check the download.
-
Run
pip install -r requirement.text
to install the exactly same dependencies. -
Or use
conda-pack
command to install the environment downloaded from here with [0dhw]:pip install conda-pack mkdir -p [path_to_conda_env] # (e.g., ~/anaconda/envs/ENV_NAME) tar -zxvf [ENV_NAME].tar.gz -C [path_to_conda_env]
Caveats:
To reduce CPU memory cost, we use shared memory to share annotation files across data readers. Be sure to delete any file with the prefix sharearray_
under /dev/shm/
after you finish training.
-
Training (Load checkpoint with [0dhw] to
snap/pretrained/CLIP_VL_RN50x4/Epoch11_LXRT.pth
):./scripts/[train_side_xxx.sh] 0,1,2,3 [model_name] 9590 4
When the model finishes training, you will get
snap/vqa or gqa/[model_name]/BEST.pth
. -
Testing:
./scripts/[test_side_xxx.sh] 0,1,2,3 [model_name] 9590 4
It will generate
snap/vqa/[model_name]/test_predict.json
for vqa orsnap/gqa/[model_name]/submit_predict.json
for gqa, which could submited to the VQA leaderboard or GQA leaderboard for Dev and Std results. -
One can download our best checkpoints of vqa and gqa with [0dhw].