I have frozen all my layers, but the output changes

YinYang_Untalan · January 23, 2020, 11:33am

I trained my ann with all layers unfrozen. Then I froze all all my layers using layer.requires_grad = False.

Then I trained it again. The outputs change. I have taken a look at the state_dict of one of the layers to see if the weights have been changed - they are still the same.

Am I missing something here? How do I properly freeze the layers?

I tried doing for p in layer.parameters(): p.requires_grad = False. I did this for all the layers and when I trained, I got an error that basically says there is no gradients to compute - which is what I expected. Then i proceeded to do this for only specific layers. one set of layers is a decoder: This is what i did in more detail

I have an encoder and a decoder which I trained and froze using above (p.req… = False). Then I wanted to play around so I attached a classifier to the encoding, and in the middle of the classifier I have a layer that has the same dimension as the encoding which means I can put it thru the decoder - and i did. My loss functions are the classification loss and the MSELoss involving the middle of the classifier going thru the decoder. My problem is the decoder (or possible even the encoder) changes. This is how I verify that:

train the encoder and decoder
freeze the encoder and decoder
set t = random tensor with same size as input
pass it thru the network and visualize the decoded output (A)
train the classifier and also add loss from the output of passing a middle classifier layer thru the decoder (this shouldn’t affect the decoder and probably would just pass the errors on the classification layers)
pass t again it thru the network and visualize the decoded output (A’)
A and A’ are not the same

Thank you for helping.

ptrblck · January 24, 2020, 6:00am

I guess you might have some dropout or batchnorm layers in your model, which are still active even though the trainable parameters are frozen.
Calling model.eval() will use the running stats in batch norm layers (instead of using the current batch stats and updating the running estimates) and dropout layers will be disabled.
Could this be the issue?

YinYang_Untalan · January 24, 2020, 1:20pm

I saw the issue. I was using the Adam optimizer so setting the grad flag off wasnt enough. I needed to filter the parameters I pass to Adam. Is there a cleaner way to this? Thank you.