Vision Transformer dataset

On what dataset are the pretrained models of ViT_B_16 and ViT_L_16 trained upon ? (For e.g. I could not understand from the description of ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 as which dataset was used for the training)

(vit_b_16 — Torchvision main documentation)

The model is trained in Image-net dataset.

So, I did a bit of reading and this is what I understood.
The models were pretrained using the IG 3.6B dataset following the SWAG approach of this paper (
And later, it was finetuned on Imagenet-1k.