I’m working with VAEs, 3 individual ones: vae1, vae2, vae3. In vae3 i’m loading the pre-trained encoder of vae2 and the pre-trained decoder of vae1. After the decoder i add some layer aswell.
My problem is now: I want to freeze the decoder. Therefore, just the encoder and the last layer should be trained.
I’ve read a lot about freezing layers but i’m still not sure if it is enough to run through the decoder (existing of Conv2d, ConvdTrans, BatchNorm, LeakyRelu) and just set require_grad=false… The optimizer is initialized with model.parameters(). Is this the proper way? Will the encoder and the last layer be correctly optimized?
EDIT: for clarification: the architecture is basically a variational autoencoder, which consists of a trainable encoder, a froze decoder and some trainable layer after the decoder.
Sorry for I can not completely understand the structure of your network, but I think your main question is how to freeze part of the network with conv layers and BN layers.
You can achieve it by:
- Set the decoder to eval mode:
This is used to freeze BN layers (and dropout). In BN layers, besides parameters, there are buffers which are not optimized by the optimizer but updated automatically during forwarding in training mode. Please see explanation at How to properly fix batchnorm layers.
- Exclude decoder parameters from the optimizer.
- (Optional) set:
requires_grad=false. I think this is mainly to speed up training and save memory. If not, please tell me.
Thanks for your response.
I edited the post, maybe the architecture is now more clear.
Setting the whole decoder into evaluation mode with decoder.eval(); are gradients properly calculated for this inner layers and is backpropagation done correctly from the last layer until the encoder?
What is the benefit of telling the optimizer not to optimize the decoder anymore? I thought that the decodee wouldn’t be optimized anyways due to freezing weights/biases.
The only way to “freeze weights and biases” is to exclude parameters from the optimizer. eval() has no effect on the convolutional layers. It is used to freeze BN and dropout. Besides, excluding parameters can save GPU memory.
.eval() does not affect the gradient calculation, but I am not sure if the gradient can be properly calculated when the inner layers are set