DataLoader implicitly using CPU?

No, the DataLoader will load each sample from Dataset.__getitem__ and use the collate_fn to create a batch out of these samples. It has no knowledge, if these tensors are on the CPU or GPU.

Your current approach of moving the data to the GPU in your collate_fn should have the same effect as moving the data to the device inside the training loop, since each batch will be pushed to the device, not the complete dataset.

For general information about data loading bottlenecks, I would recommend to have a look at this post.

2 Likes