“log_alpha” is a variable(I think), every time I update its data with its grad, the grad will change into None.

code as follow:

1 alpha_loss = -torch.mean(self.log_alpha * (log_pis.double() + self._target_entropy))

2 #self.log_alpha.retain_grad()

3 alpha_loss.backward()

4 grad = self.log_alpha.grad

5 self.log_alpha = self.log_alpha - grad * 3e-4

but if I use line 2 code, no error. So I’m a little confused about the grad clearing mechanism.

& I’m not sure if the grad will be added all the time.

It seems you are replacing `self.log_alpha`

in your example.

Have a look at these example, where we 1) replace `x`

, 2) assign to a new tensor and 3) modify `x`

inplace:

```
# 1) replace tensor
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
x = x - grad
print(x.grad)
> None
# 2) assign to other tensor
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
y = x - grad
print(x.grad)
> tensor([0.5000, 0.5000])
print(y.grad)
> None
# 3) inplace
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
with torch.no_grad():
x.sub_(grad)
print(x.grad)
> tensor([0.5000, 0.5000])
```

I get it~ your explanation is so clear, thank you.

If I update x with loss.backward() but without zero_grad(), because I don’t use an optimizer. In this way, whether the grad of x will be restored and added all along？

It depends on the approach you are using. E.g. in 3) you would accumulate the gradients.

Well, thank you so much~