Whether after updating param.grad, the grad is cleared automaticly

“log_alpha” is a variable(I think), every time I update its data with its grad, the grad will change into None.
code as follow:
1 alpha_loss = -torch.mean(self.log_alpha * (log_pis.double() + self._target_entropy))
2 #self.log_alpha.retain_grad()
3 alpha_loss.backward()
4 grad = self.log_alpha.grad
5 self.log_alpha = self.log_alpha - grad * 3e-4
but if I use line 2 code, no error. So I’m a little confused about the grad clearing mechanism.
& I’m not sure if the grad will be added all the time.

It seems you are replacing self.log_alpha in your example.
Have a look at these example, where we 1) replace x, 2) assign to a new tensor and 3) modify x inplace:

# 1) replace tensor
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
x = x - grad
print(x.grad)
> None

# 2) assign to other tensor
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
y = x - grad
print(x.grad)
> tensor([0.5000, 0.5000])
print(y.grad)
> None

# 3) inplace
x = torch.randn(2, requires_grad=True)
loss = torch.mean(x)
loss.backward()
print(x.grad)
> tensor([0.5000, 0.5000])
grad = x.grad
with torch.no_grad():
    x.sub_(grad)
print(x.grad)
> tensor([0.5000, 0.5000])

I get it~ your explanation is so clear, thank you.
If I update x with loss.backward() but without zero_grad(), because I don’t use an optimizer. In this way, whether the grad of x will be restored and added all along?

It depends on the approach you are using. E.g. in 3) you would accumulate the gradients.

Well, thank you so much~