Requires grad == False doesn't save ime

I would like to train a network with fixed convolution kernels. Those that come from the initalization. However, requires_grad == False or optimizing over the specific layers I do optimize - doesn’t help in saving training time. It doesn’t make sense since the derivatives are not being changed and not need to be calculated.

You still need the derivatives to compute backprop.
Imagine I do have layers A–>B–>C
in order to backprop to A i do need B grads even if they are frozen.