Help with feature extracting pre-trained VisionTransformer model

Can I finetune (in my case feature extracting) the pre-trained VisionTransformer model the same way as described in the pytorch Tutorial (Finetuning Torchvision Models — PyTorch Tutorials 1.2.0 documentation) ?

In the tutorial only convolutional neural networks are used, so do I have to modify anything for the VisionTrasformer model (VisionTransformer — Torchvision 0.13 documentation) or does it just work the same way?

The general fine-tuning logic would apply to all models, but you might need to change some model attributes (e.g. replacing the .classifier might look a bit different using other models depending how the internal modules are structured).

1 Like