Does calling backward on two different variables affect how gradients are calculated?

I saw someone’s code that went like this while operating on the same model…

loss_one = criterion(...)
loss_one.backward()
loss_two = criterion(...)
loss_two.backward()
optimizer.step()

My feeling would be to code it like this…

loss = criterion(...) + criterion(...)
loss.backward()
optimizer.step()

I cannot explain to myself if there is any difference in these two flows…what is the truth?

Hi,

There is no difference between the two in the gradients that are computed.

Because of some optimization we do (to reduce memory usage), you might have to pass retain_graph=True to the first backward to avoir errors.