Hello, just wanted to ask what is the correct order to call loss.item() and loss.backward(). I read a forum post saying that if you call item() after backward() execution is slower, however I also read that if you use .item() before backward() then the gradient does not flow. Can someone help me find a concrete answer for this?
Most likely the order won’t matter.
Calling item()
on a CUDATensor
will synchronize your code and will thus potentially slow it down (e.g. if it wasn’t synchronized before anyway). I wouldn’t expect to see a large difference in speed if you synchronize the code before or after the backward
call.
Calling loss.item()
does not have any effect on the loss.backward()
call unless you chain them loss.item().backward()
which will raise an error.
So to make sure I understand correctly, there is no need to take care of their order, and if possible, removing .item() will speed up my training code? I don’t think I synchronize my code elsewhere, I have a very basic training loop, although I am not completely aware of what synchronizing your code is exactly.