Hi.
Say I know both what the output of my network should be and what the gradients ought to be for each training example.
Is it possible to train a network in pytorch in which the loss depends on both the output of the network and the gradients of the network w.r.t. the network input?
inp = Variable(torch.rand(3, 4), requires_grad=True)
W = Variable(torch.rand(4, 4), requires_grad=True)
yreal = Variable(torch.rand(3, 4), requires_grad=False)
gradsreal = Variable(torch.rand(3, 4), requires_grad=True)
ypred = torch.matmul(inp, W)
ypred.backward(torch.ones(ypred.shape))
gradspred = inp.grad
loss = torch.mean((yreal - ypred) ** 2 + (gradspred - gradsreal) ** 2)
loss.backward()
This won’t work as inp.grad
is volatile. Also would I have to zero all gradients after calculating the gradspred?