I’m getting different values out of the pretrained Resnet across epochs. This is how I’m initialising the CNN,
self.cnn = models.resnet101(pretrained=True)
self.trimmed_cnn = nn.Sequential(*list(self.cnn.children())[:-2]) # take off the FC and mean pool layers
for param in self.trimmed_cnn.parameters():
param.requires_grad = False
When I evaluate img_features = self.trimmed_cnn(img) on epoch 1 of training, it’s different than when I do the same evaluation on epoch 3. I’ve made sure the input img is the same in both cases. I know that convolution layers on GPU are non-deterministic, but the output I’m getting is significantly different.
On epoch 1, if I evaluate img_features = self.trimmed_cnn(img) twice, they give the same result. This makes me think that somehow the weights of the CNN are being modified despite setting param.requires_grad = False?
Did you set the model to eval mode (model.eval())?
Usually the pre-trained models are in training mode after loading.
You can check this with model.training, which is a bool value.
Maybe some layers like BatchNorm have been updated?
Although I had to manually set trimmed_cnn to eval() after every evaluation since calling self.net.train() is recursive and sets trimmed_cnn to train mode.
self.net.eval()
# do evaluation ...
self.net.train()
self.net.trimmed_cnn.eval() # this line is required... or the problem persist
Isn’t it weird BatchNorm layers are being updated if I’ve set requires_grad = False for all the layers?
Well, the weigths and bias are not updated (gamma and beta in the paper), but the running mean and variance.
These do not require gradients but only samples.
That makes sense thanks! Although it’s a bit strange I didn’t come across anyone mentioning this problem on other forum posts. Seems important if you want to work with a frozen, pretrained CNN.
We DO want to keep the running mean and variances frozen after initialisation right? i.e. this is the correct way to use a pretrain Resnet? They should be set to desirable values from the Resnet’s pretraining?