Dataloading: pin memory vs create in cuda


My question is related to loading input data using the combination of Dataset and DataLoader. Since pin_memory option in a DataLoader works for CPU tensors, I understand that there are two ways to load the input data:

  1. In the Dataset create CPU tensors and then using pin_memory transfer it to GPU
  2. Directly create CUDA tensors in the Dataset and don’t use pin_memory

Which method is more efficient wrt data loading time and why? Any references would be helpful.


1 Like

I would stick to the first approach, as this would push a whole batch to the device (avoiding multiple small tranfers), which might potentially be executed asynchronously while your GPU is busy. Using pinned memory will also speedup the transfer. Have a look at NVIDIA’s blog post for some more information.