Requires grad == False doesn't save ime

Frida · July 13, 2020, 7:01am

Hi
I would like to train a network with fixed convolution kernels. Those that come from the initalization. However, requires_grad == False or optimizing over the specific layers I do optimize - doesn’t help in saving training time. It doesn’t make sense since the derivatives are not being changed and not need to be calculated.

JuanFMontesinos · July 13, 2020, 10:41am

You still need the derivatives to compute backprop.
Imagine I do have layers A–>B–>C
in order to backprop to A i do need B grads even if they are frozen.