What's the best practice to pytorch.optim when gradients are calculated explicitly?


Let’s say, I am trying use one of the pytorch.optim algorithms (e.g. SGD) to optimize my loss function with respect to weight W. The gradient of loss with respect to W is calculated/approximated explicitly instead of using loss.backward(). Then how should I use the pytorch.optim algorithm to optimize W with the explicitly calculated gradient?

I guess one way to do this is make W.requies_grad = True, pass W to optim.SGD, and pass the calculated gradient to W.grad before calling optimizer.step(). But it has an expected effect. When W.requires_grad = True, the intermediate tensor that uses W to calculate its gradients will all have requires_grad = True and pytorch has to store the whole operation graph ( I am not sure if pytorch will even allocate memory for variable.grad), which is not all used later.

Does anyone know a better way to do this?