For example, without a backwards pass:
for param in model.parameters():
param.grad = 3.1415
optimization.step()
What about doing a backwards pass but then modifying the grads:
loss.backward()
for param in model.parameters():
param.grad += 3.1415
optimization.step()
Yes, this works. By default, autograd.backward()
computes the gradients without storing the computation graph for the gradients (in case you wanted to e.g. do a double backward). So making non-differentiable modifications to the gradients, if you aren’t planning on doing something like MAML or gradient penalty, should be fine.
If you look at the implementation of e.g. torch.optim.SGD
, you see it just makes an in-place update to the model parameter p
with whatever is in p.grad
.