First, if I understand correctly the automatic differentiation, the correct gradient with respect to the input can be obtained like this:
output = NN(input, weight)
output.backward(retain_variables=True)
grad_input = input.grad # at least, this is a variable, unfortunately volatile
Now, the problem is that grad_input is a volatile variable disconnected to the graph. So as you said, you have the value, but you don’t have grad_input = f(input) in the graph. So if you do z = grad_input*2 and try z.backward(), you will have an error.
You have to be patient, according to this topic, pytorch will soon keep the gradient into the graph, and it will be possible to call backward a second time and get second order derivatives.
But in your case, you can avoid the second order derivative with some mathematics:
if you want grad_input = target, then it is the same than: output = target*input + c, for any c.
that is the same than : (output-c)/input = target
so you can minimize your loss with respect to the parameters of your NN, plus an additive biais c:
if output and input have different dimentions, then in fact, you want to approximate the target with the Gramian produce of (output-c) and 1/input (with the pointwise inversion).
class NN2(NN):
def __init__(self, output_size):
super(NN2, self).__init__()
self.c = Variable(torch.rand(output_size))
def forward(input):
output = super(NN2, self).forward(input)
x_inv = (1/x).view(-1)
gram = torch.mm( (input - self.c).view(-1), x_inv.t() )
return gram
optimizer = optim.Adam(NN2.parameters(), lr)
gram = NN2(input)
loss = criterion(gram, target_of_grad_input)
loss.backward()