n, d = 10, 3
lin1 = nn.Linear(d, d)
# case 1
x = torch.randn(d).requires_grad_()
y = lin1(x)
vec = torch.ones(d)
gr = torch.autograd.grad(y, x, vec, retain_graph=True)[0]
print(gr.shape) # returns: torch.Size([3])
# case 2
x = torch.randn(n, d).requires_grad_()
y = lin1(x)
vec = torch.ones(d)
gr = torch.autograd.grad(y[0], x[0], vec, retain_graph=True)[0]
# Raises Error:
# RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

why does taking x[i] remove it from the computation graph, I tried to clone it or to use .narrow but in vain!

y[0] only depends on x[0], so I don’t want to compute the gradient with regard to the full input!
Any help is appreciated!

I guess the issue is caused by indexing x since it would create a new view, which was not used directly in the computation graph.
The same error would be raised if you use gr = torch.autograd.grad(y, x.clone(), vec, retain_graph=True)[0] in the first example. @albanD is there a more elegant way using some new (beta) features for this type of computation?

Not sure which beta feature you’re referring to?
But in general, autograd works at the Tensor level, so no you won’t be able to compute the gradient wrt a subset of the Tensor any faster than computing the whole gradient and then taking the subset you care about.
To do so, you will have to do the forward with a Tensor that represent the subset you care about.
Does that make sense?