Vision Transformer dataset

stanleygeorge · May 19, 2023, 3:15pm

On what dataset are the pretrained models of ViT_B_16 and ViT_L_16 trained upon ? (For e.g. I could not understand from the description of ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 as which dataset was used for the training)

(vit_b_16 — Torchvision main documentation)

Kapil_Rana · May 23, 2023, 5:48am

The model is trained in Image-net dataset.

stanleygeorge · May 23, 2023, 7:50pm

So, I did a bit of reading and this is what I understood.
The models were pretrained using the IG 3.6B dataset following the SWAG approach of this paper (https://arxiv.org/pdf/2201.08371.pdf).
And later, it was finetuned on Imagenet-1k.