I want to make sure I understand the relationship between
setting requires_grad = False
AND
not passing the layer’s parameters into the optimizer, as discussed here.
My question is if requires_grad = False
, what will the optimizer do with those parameters? Does it ever make sense to set requires_grad = False
and still pass the frozen parameters to the optimizer?