I am trying to train a resnet-18 downloaded from torchvision model downloaded using the following command model=torchvision.models.resnet18(pretrained=False, num_classes=100)
I am only able to reach an accuracy of 58%. I am using data-augmentations and hyperparameters followed by a lot of projects at github which locally specify the structure of the network instead of using the one from torchvision. I understand that the difference can arise from a lot of reasons.
What I am looking for is to know if there’s a project/set of hyperparameters which I can follow to perform well on the torchvision models. I have observed a similar trend for vgg16_bn as well.
I know this reply is very late. The issue is that torchvision models are not meant for cifar datasets with input size 32x32. Because the model reduces the feature map size with each layer block by the end its just too small.
You can see how to fix this in this colab tutorial
basically, you need to change the stride and replace the maxpool layer with the identity function. This will result in a larger feature map before pooling.