Pretrained Resnet101 nondeterministic across epochs

I’m getting different values out of the pretrained Resnet across epochs. This is how I’m initialising the CNN,

self.cnn = models.resnet101(pretrained=True)
self.trimmed_cnn = nn.Sequential(*list(self.cnn.children())[:-2])  # take off the FC and mean pool layers
for param in self.trimmed_cnn.parameters():
    param.requires_grad = False

When I evaluate img_features = self.trimmed_cnn(img) on epoch 1 of training, it’s different than when I do the same evaluation on epoch 3. I’ve made sure the input img is the same in both cases. I know that convolution layers on GPU are non-deterministic, but the output I’m getting is significantly different.

For example,

On epoch 1, if I evaluate img_features = self.trimmed_cnn(img) twice, they give the same result. This makes me think that somehow the weights of the CNN are being modified despite setting param.requires_grad = False?

Did you set the model to eval mode (model.eval())?
Usually the pre-trained models are in training mode after loading.
You can check this with model.training, which is a bool value.

Maybe some layers like BatchNorm have been updated?

@ptrblck your solution solved it …

Although I had to manually set trimmed_cnn to eval() after every evaluation since calling self.net.train() is recursive and sets trimmed_cnn to train mode.

    self.net.eval()

    # do evaluation ...

    self.net.train()
    self.net.trimmed_cnn.eval()  # this line is required... or the problem persist

Isn’t it weird BatchNorm layers are being updated if I’ve set requires_grad = False for all the layers?

Glad you solved it! :slight_smile:

Well, the weigths and bias are not updated (gamma and beta in the paper), but the running mean and variance.
These do not require gradients but only samples. :wink:

That makes sense thanks! Although it’s a bit strange I didn’t come across anyone mentioning this problem on other forum posts. Seems important if you want to work with a frozen, pretrained CNN.

We DO want to keep the running mean and variances frozen after initialisation right? i.e. this is the correct way to use a pretrain Resnet? They should be set to desirable values from the Resnet’s pretraining?