Two separate gradient computations vs one total

There are two loss functions: L and M, the total loss is T=L+M. Is there any difference between computing

L.backward()
M.backward()
optimizer.step()

and

T=L+M
T.backward()
optimizer.step()

The main difference will be that the second one will be faster to get the same result :slight_smile: