Dataloading: pin memory vs create in cuda

Soumya_Sanyal · December 30, 2018, 7:00am

Hi,

My question is related to loading input data using the combination of Dataset and DataLoader. Since pin_memory option in a DataLoader works for CPU tensors, I understand that there are two ways to load the input data:

In the Dataset create CPU tensors and then using pin_memory transfer it to GPU
Directly create CUDA tensors in the Dataset and don’t use pin_memory

Which method is more efficient wrt data loading time and why? Any references would be helpful.

Thanks!

ptrblck · December 30, 2018, 2:44pm

I would stick to the first approach, as this would push a whole batch to the device (avoiding multiple small tranfers), which might potentially be executed asynchronously while your GPU is busy. Using pinned memory will also speedup the transfer. Have a look at NVIDIA’s blog post for some more information.