I have a following piece of code (part of a train function):
loss = criterion(output, target)
loss.backward()
optimizer.step()
loss_scalar = loss.item()
I run it on GPU without DataParallel. I have the latest version of PyTorch 1.0.1.post2 with CUDA 10.
If I run it in this order, execution time is 40 seconds. However, if I put loss.item()
before loss.backward()
, my execution time blows up to 200 seconds. Most of the time is then spent in .backward()
as I can see from my profiler.
What would be the reason for that? Is there any preferred order of converting loss to scalar?