Some weird issue with operations inside "with no_grad"

Dear all.

I found this issue today and I am not sure if it is something expected or not. I am trying to code a linear regression with PyTorch.

for epoch in range(n_epochs):
  # compute prediction, compute loss, loss.backward()
  with torch.no_grad():

    # option 1 (runtime error):
    # a = a - lr* a.grad
    # b = b - lr* b.grad 

    # option 2
    a -= lr* a.grad
    b -= lr* b.grad 

Both options are syntactically identical but the first one returns a runtime error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Can someone point me to the reason why PyTorch treats the shortened subtraction differently from the expanded version?

When you use the expanded version, that is -

a = a - lr* a.grad
b = b - lr* b.grad 

new tensors a and b get created each time.

and since these are getting created under torch.no_grad(), their requires_grad attribute will be False - this explains the error.

With this code -

a -= lr* a.grad
b -= lr* b.grad

tensors a and b get modified in place i.e. no new tensors are created, only the values of existing tensors are modified in place.