Yes, this works. By default, autograd.backward() computes the gradients without storing the computation graph for the gradients (in case you wanted to e.g. do a double backward). So making non-differentiable modifications to the gradients, if you aren’t planning on doing something like MAML or gradient penalty, should be fine.
If you look at the implementation of e.g. torch.optim.SGD, you see it just makes an in-place update to the model parameter p with whatever is in p.grad.