Create tensor from dataloader directly on gpu?

Hey, is it possible to use default_collate such that it directly creates tensors on the cuda device? Then, the .to(‘cuda’) step can be saved, which will cost less time, right?


Best, JZ

Yes, it’s possible, but note that you have to make sure no additional CUDA contexts are created if you are using multiprocessing (i.e. multiple workers) and you would need to profile the use case and see if it’s really speeding up the training. By default the CPU will be used and will create the batches in the background (if multiple workers are used again) while the GPU is busy training. If you directly push the samples to the GPU, you would of course need GPU resources, which might slow down your actual training.

Ah, yes, I understand. If all the cpu workers create the tensors directly on the gpu, then, this will much overutilize the GPU memory, right? From that viewpoint, it seems more reasonable to use .to().