Compute grad with regard a slice of the input

nazareem · February 17, 2023, 9:29am

Hi,

Do you know why I get the following error:

n, d = 10, 3
lin1 = nn.Linear(d, d)

# case 1
x = torch.randn(d).requires_grad_()
y = lin1(x)
vec = torch.ones(d)
gr = torch.autograd.grad(y, x, vec, retain_graph=True)[0]
print(gr.shape)  # returns: torch.Size([3])

# case 2
x = torch.randn(n, d).requires_grad_()
y = lin1(x)
vec = torch.ones(d)
gr = torch.autograd.grad(y[0], x[0], vec, retain_graph=True)[0]

# Raises Error:
# RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

why does taking x[i] remove it from the computation graph, I tried to clone it or to use .narrow but in vain!

y[0] only depends on x[0], so I don’t want to compute the gradient with regard to the full input!
Any help is appreciated!

ptrblck · February 17, 2023, 9:37am

I guess the issue is caused by indexing x since it would create a new view, which was not used directly in the computation graph.
The same error would be raised if you use gr = torch.autograd.grad(y, x.clone(), vec, retain_graph=True)[0] in the first example.
@albanD is there a more elegant way using some new (beta) features for this type of computation?

albanD · February 28, 2023, 6:02pm

Not sure which beta feature you’re referring to?
But in general, autograd works at the Tensor level, so no you won’t be able to compute the gradient wrt a subset of the Tensor any faster than computing the whole gradient and then taking the subset you care about.
To do so, you will have to do the forward with a Tensor that represent the subset you care about.
Does that make sense?

thash · October 26, 2024, 2:00pm

Hello, @albanD @ptrblck

Is there anything I could do to solve this issue by now?