How the pytorch freeze network in some layers, only the rest of the training?

If we set the requires_grad to false for a particular layer, do we have to leave it out of the optimizer?
Such as, this ->

optimizer = optim.SGD(filter(lambda p: p.requires_grad, net.parameters()), lr=0.1)

Is the benefit in the speedup? Or is it wrong if one does not filter it out of the optimizer?
Thanks

1 Like