What is loss.item()

Calls into item() might slow down your code, as they are synchronizing.
While detach() could potentially avoid synchronizations, a push to the CPU would still wait for the GPU to finish the calculation and would thus synchronize, so I don’t think your self.metrics would behave differently in this case.
However, let me know if you see any changes.

1 Like

Yes, you were correct. I could only manage to reduce time a little by using more num_workers in dataloader and using 2 gpus. But still it is taking around 3.5 hrs for each epoch for 50.000 images, which I feel is still a lot.

Form what I understand, I don’t need to specify the device for any tensor as pytorch lightning takes care of it. So would it be okay if I replace to_cpu(x).item() with just x.detach() ?

You can try it out, but I assume the implementation in Lightning might be there for a reason.
E.g. if you need to print these values after returning them, you would still need to synchronize the code, since these values have to be calculated first.

Okay I would give it a try. Thanks