Two separate gradient computations vs one total

sigma_x · March 4, 2020, 11:47am

There are two loss functions: L and M, the total loss is T=L+M. Is there any difference between computing

L.backward()
M.backward()
optimizer.step()

and

T=L+M
T.backward()
optimizer.step()

albanD · March 4, 2020, 8:40pm

The main difference will be that the second one will be faster to get the same result