Pretrained Resnet101 nondeterministic across epochs

shenkev · January 11, 2018, 5:03pm

I’m getting different values out of the pretrained Resnet across epochs. This is how I’m initialising the CNN,

self.cnn = models.resnet101(pretrained=True)
self.trimmed_cnn = nn.Sequential(*list(self.cnn.children())[:-2])  # take off the FC and mean pool layers
for param in self.trimmed_cnn.parameters():
    param.requires_grad = False

When I evaluate img_features = self.trimmed_cnn(img) on epoch 1 of training, it’s different than when I do the same evaluation on epoch 3. I’ve made sure the input img is the same in both cases. I know that convolution layers on GPU are non-deterministic, but the output I’m getting is significantly different.

For example,

On epoch 1, if I evaluate img_features = self.trimmed_cnn(img) twice, they give the same result. This makes me think that somehow the weights of the CNN are being modified despite setting param.requires_grad = False?

ptrblck · January 11, 2018, 5:35pm

Did you set the model to eval mode (model.eval())?
Usually the pre-trained models are in training mode after loading.
You can check this with model.training, which is a bool value.

Maybe some layers like BatchNorm have been updated?

shenkev · January 11, 2018, 5:46pm

@ptrblck your solution solved it …

Although I had to manually set trimmed_cnn to eval() after every evaluation since calling self.net.train() is recursive and sets trimmed_cnn to train mode.

    self.net.eval()

    # do evaluation ...

    self.net.train()
    self.net.trimmed_cnn.eval()  # this line is required... or the problem persist

Isn’t it weird BatchNorm layers are being updated if I’ve set requires_grad = False for all the layers?

ptrblck · January 11, 2018, 5:49pm

Glad you solved it!

Well, the weigths and bias are not updated (gamma and beta in the paper), but the running mean and variance.
These do not require gradients but only samples.

shenkev · January 11, 2018, 5:50pm

That makes sense thanks! Although it’s a bit strange I didn’t come across anyone mentioning this problem on other forum posts. Seems important if you want to work with a frozen, pretrained CNN.

We DO want to keep the running mean and variances frozen after initialisation right? i.e. this is the correct way to use a pretrain Resnet? They should be set to desirable values from the Resnet’s pretraining?