If we set the requires_grad to false for a particular layer, do we have to leave it out of the optimizer?
Such as, this ->
optimizer = optim.SGD(filter(lambda p: p.requires_grad, net.parameters()), lr=0.1)
Is the benefit in the speedup? Or is it wrong if one does not filter it out of the optimizer?
Thanks