Is it more efficient to load all tensors to GPU first or do it batch wise?

Is it better to call .cuda() on my input and tensors first and store it first or should I do that in my batch generator. Does it make a difference ?

I think so, as long as it fits in your GPU.
If you do it each batch, the time spent on transferring data from memory to GPU can be relatively large. That’s why you may observe a low GPU-Util with nvidia-smi.

Thank you for the reply. @SnowWalkerJ :slightly_smiling_face:

My thoughts were the same, but I’m running into an issue with DataLoader returning non-CUDA tensors, which made me think that that’s not how it was meant to be used. Have you faced this issue ?