I have a following piece of code (part of a train function):
loss = criterion(output, target) loss.backward() optimizer.step() loss_scalar = loss.item()
I run it on GPU without DataParallel. I have the latest version of PyTorch 1.0.1.post2 with CUDA 10.
If I run it in this order, execution time is 40 seconds. However, if I put
loss.backward(), my execution time blows up to 200 seconds. Most of the time is then spent in
.backward() as I can see from my profiler.
What would be the reason for that? Is there any preferred order of converting loss to scalar?