Using Pytorch optim algorithms with gradients computed outside model.backward()

I’m implementing an algorithm that involves some tricks to allow parallelism in the forward runs and gradient computations during the training process. I’m doing these computations using the multiprocessing library, which seems to mean that I can’t set requires_grad=True for my variables even if I do not use the backward() function and calculate them myself. I wonder whether this also means I’m unable to use the optim optimization routines, such as SGD, or if there is a way to make use of them?

The optimizers should still work, as they are just using the .grad attribute of each passed parameter:

x = torch.zeros(1)
optimizer = torch.optim.SGD([x], lr=1.)

x.grad = torch.tensor([10.])

> tensor([-10])
1 Like