Question about autograd and leaf tensors

vikigenius · July 31, 2019, 9:43pm

I have always been confused about this behaviour

Let us say I am implementing an autoencoder. My source and target are going to be same, let us say I want to reconstruct some vector x.

So something like
z = Encoder(x)
y = Decoder(z)

loss = LOSS(y, x)
loss = LOSS(y, x.clone())
loss = LOSS(y, x.clone().detach())

What is the difference in behavior in each of the cases, x is of course a leaf tensor, but I don’t understand the gradients aspect is it 1) going to form a cycle ?

ptrblck · August 2, 2019, 7:18pm

It shouldn’t be a problem, as usually x won’t require gradients in an autoencoder setup.
The first two approaches should yield the same result, while the last one should be different.
Here is a small dummy code:

x = torch.randn(1, requires_grad=True)
param = nn.Parameter(torch.randn(1))
output = x * param

# 1.
loss = F.mse_loss(output, x)
loss.backward()
print(param.grad)
print(x.grad)

x.grad.zero_()
param.grad.zero_()

# 2.
output = x * param
loss = F.mse_loss(output, x.clone())
loss.backward()
print(param.grad)
print(x.grad)

x.grad.zero_()
param.grad.zero_()

# 3.
output = x * param
loss = F.mse_loss(output, x.clone().detach())
loss.backward()
print(param.grad)
print(x.grad)

> tensor([-3.3104])
tensor([0.6608])

> tensor([-3.3104])
tensor([0.6608])

> tensor([-3.3104])
tensor([-0.9747])

vikigenius · August 4, 2019, 4:15am

Thanks this answers my question. But since x is a leaf variable, it’s grad value serves no purpose right?

ptrblck · August 4, 2019, 9:19am

Usually you don’t need gradients in your input tensor.
However there are some use cases, e.g. in adversarial training, where these gradients can be used for small perturbation of the input.

doojin · July 8, 2022, 3:21pm

It’s an old article, but I’m asking because it’s similar to my situation!

Is the last case wrong?
I’ve only seen the first case.

ptrblck · July 8, 2022, 4:42pm

I wouldn’t claim it’s wrong as it depends on your use case.