Should DataLoader
workers add examples directly to the GPU? Or should that be handled by the main process?
Specifically, the DataLoader
is using the Dataset
's __getitem__
method to prepare the next batch of items while the main process is running a training step on the current batch of data (correct me if this is incorrect). So in my __getitem__
I load the data (from images in my case), do some preprocessing, and stick it into a PyTorch tensor. Should that tensor be put onto the GPU in __getitem__
? I could suspect this could cause trouble since the number of these tensors slowly rises as the new data is being prepared. But if the allocation is automatically handled correctly, I could also see it being fine if it just slowly fills up the prepared space for the next batch. Should I be moving things to GPU in __getitem__
, and therefore in the data worker process? Or should it wait until the main process to move the whole batch at once? Or am I misunderstanding something about the whole procedure? Thank you!