Using optimizers with torch.autograd.grad()?

My team is trying to use torch.autograd.grad() with torch optimizers, but optimizers only consume .grad properties, not arbitrary lists of gradients. To work around this, we manually set .grad properties from stuff we compute with grad().

Is there a better pattern?

i think the optimizers should take an optional “grads” argument, either in step or at construction time to facilitate this. Worth a discussion / RFC on https://github.com/pytorch/pytorch/issues