Is there any potential for PyTorch to release smaller Vision Transformers than the ViT-B 16 ?
Currently that is the smallest pretrained ViT available which follows the architecture of the original paper. A smaller model with lower depth, lower embedding dimensions and mlp dimensions would be helpful