Grad through frozen weights

I am a little confused, please help enlighten me.

Imagine an image model with cats and dogs data. I have seen models where there is a trainable autoencoder recreating the image, and then the output is passed through a frozen resnet encoder, with a trainable top linear layer categorizing cat and dog.

That is, we have the input travel through trainable layers, frozen layers, and then trainable layers again. How can that possibly train? What are the general rules around training through frozen weights? I have tested training an autoencoder model through a frozen decoder… that seems to work. But adding a trainable layer afterward seems to me to break the chain.

Follow up related question, I have seen situations where the input variable/tensor has a grad, while the weights/bias don’t have a grad (are frozen). Is this what allows differentiation through frozen layers to work?


Autograd will make sure to continue the backpropagation to the first parameter, which needs gradients even if frozen layers are between other trainable layers.
You could set the requires_grad attribute to True for the input, but that would force Autograd to backpropagate to the input, whether or not some layers are frozen.
If you don’t need the gradient in your input tensor, I would recommend not to set requires_grad=True for it.