Overhead, your tensors are tiny and you don’t load all that many images so sending data to your GPU is slower than just doing the computation directly on CPU.
Your benchmark is also problematic because you’re not doing any actual computation on the GPU so just sending data to the GPU wont give you any benefits because GPUs are fast at matrix multiplication but very slow at data transfers
Thanks for the reply @marksaroufim !
I understand this is only a toy example which doesn’t take into account the benefits of the GPU.
However, when using it in a real training/eval process, this leads to GPU and CPU taking approximately the same time (the advantages of GPU “settle up” with the slow data loading).
Therefore I’m not able to benefit from the GPU.
I believe it’s a bit too slow. What do you think?