Calls into item()
might slow down your code, as they are synchronizing.
While detach()
could potentially avoid synchronizations, a push to the CPU would still wait for the GPU to finish the calculation and would thus synchronize, so I don’t think your self.metrics
would behave differently in this case.
However, let me know if you see any changes.
1 Like
Yes, you were correct. I could only manage to reduce time a little by using more num_workers
in dataloader and using 2 gpus. But still it is taking around 3.5 hrs for each epoch for 50.000 images, which I feel is still a lot.
Form what I understand, I don’t need to specify the device for any tensor as pytorch lightning takes care of it. So would it be okay if I replace to_cpu(x).item()
with just x.detach()
?
You can try it out, but I assume the implementation in Lightning might be there for a reason.
E.g. if you need to print these values after returning them, you would still need to synchronize the code, since these values have to be calculated first.
Okay I would give it a try. Thanks