Differentiated tensor not being used in a graph

I am trying to take derivatives of a network with respect to its arguments and I am confused about a result I have been getting. For this example my network, model, takes 3 inputs and has two outputs. Here is the code I am running:

ipt = Variable((batch[0]),requires_grad=True).view(batchSize, batch[0].shape[1])
x = ipt[:,0].unsqueeze(1)
y = ipt[:,1].unsqueeze(1)
z = ipt[:,2].unsqueeze(1)

out = model(ipt).cuda()
A = out[:,0].unsqueeze(1)
B = out[:,1].unsqueeze(1)

A_z = grad(A,z,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
A_y = grad(A,y,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
A_x = grad(A,x,torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]

I get the following error on the A_z line:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
This confuses me since this is the only derivative computed so far so it seems like A_z isn’t being added to the graph?


From what I can see, x, y and z are not used to compute the output right? Hence the error that you’re seeing :slight_smile:
You can simply ask for the gradients wrt ipt and then slice that gradient to get the subset you’re interested in :slight_smile:

So I was wondering about this so I tried:
A_z = grad(out[:,0].unsqueeze(1),ipt[:,2].unsqueeze(1),torch.ones((batch_size, 1),requires_grad=True).cuda(), create_graph=True)[0]
and got the same error. Is it important that the full input and output tensors are used and not any slices?

I have a follow up question: what’s the most appropriate way to calculate second derivatives? It sounds like the preferred method is to calculate derivatives with respect to all inputs, and this gets slow when I have to calculate higher order derivatives. Is there a way to calculate derivatives with respect to just one input if I don’t need all of the higher order derivatives?

When you do ipt[:,2].unsqueeze(1) it creates a new Tensor that is a slice of ipt. Bu that new Tensor is not used to compute the loss and that’s why you see this error.

Is there a way to calculate derivatives with respect to just one input if I don’t need all of the higher order derivatives?

You can only compute derivatives wrt to a Tensor. So if all your inputs are in a single Tensor, you have to compute the derivative for the whole thing. But this kind of batch computations are usually not much more expensive than doing a single element.