Say I know both what the output of my network should be and what the gradients ought to be for each training example.
Is it possible to train a network in pytorch in which the loss depends on both the output of the network and the gradients of the network w.r.t. the network input?
inp = Variable(torch.rand(3, 4), requires_grad=True) W = Variable(torch.rand(4, 4), requires_grad=True) yreal = Variable(torch.rand(3, 4), requires_grad=False) gradsreal = Variable(torch.rand(3, 4), requires_grad=True) ypred = torch.matmul(inp, W) ypred.backward(torch.ones(ypred.shape)) gradspred = inp.grad loss = torch.mean((yreal - ypred) ** 2 + (gradspred - gradsreal) ** 2) loss.backward()
This won’t work as
inp.grad is volatile. Also would I have to zero all gradients after calculating the gradspred?