I am trying to take derivatives of a network with respect to its arguments and I am confused about a result I have been getting. For this example my network, model, takes 3 inputs and has two outputs. Here is the code I am running:
ipt = Variable((batch[0]),requires_grad=True).view(batchSize, batch[0].shape[1])
x = ipt[:,0].unsqueeze(1)
y = ipt[:,1].unsqueeze(1)
z = ipt[:,2].unsqueeze(1)
out = model(ipt).cuda()
A = out[:,0].unsqueeze(1)
B = out[:,1].unsqueeze(1)
A_z = grad(A,z,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
A_y = grad(A,y,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
A_x = grad(A,x,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
I get the following error on the A_z line: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
This confuses me since this is the only derivative computed so far so it seems like A_z isn’t being added to the graph?
From what I can see, x, y and z are not used to compute the output right? Hence the error that you’re seeing
You can simply ask for the gradients wrt ipt and then slice that gradient to get the subset you’re interested in
So I was wondering about this so I tried: A_z = grad(out[:,0].unsqueeze(1),ipt[:,2].unsqueeze(1),torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
and got the same error. Is it important that the full input and output tensors are used and not any slices?
I have a follow up question: what’s the most appropriate way to calculate second derivatives? It sounds like the preferred method is to calculate derivatives with respect to all inputs, and this gets slow when I have to calculate higher order derivatives. Is there a way to calculate derivatives with respect to just one input if I don’t need all of the higher order derivatives?
When you do ipt[:,2].unsqueeze(1) it creates a new Tensor that is a slice of ipt. Bu that new Tensor is not used to compute the loss and that’s why you see this error.
Is there a way to calculate derivatives with respect to just one input if I don’t need all of the higher order derivatives?
You can only compute derivatives wrt to a Tensor. So if all your inputs are in a single Tensor, you have to compute the derivative for the whole thing. But this kind of batch computations are usually not much more expensive than doing a single element.