Should `DataLoader` workers add examples directly to the GPU?

Should DataLoader workers add examples directly to the GPU? Or should that be handled by the main process?

Specifically, the DataLoader is using the Dataset's __getitem__ method to prepare the next batch of items while the main process is running a training step on the current batch of data (correct me if this is incorrect). So in my __getitem__ I load the data (from images in my case), do some preprocessing, and stick it into a PyTorch tensor. Should that tensor be put onto the GPU in __getitem__? I could suspect this could cause trouble since the number of these tensors slowly rises as the new data is being prepared. But if the allocation is automatically handled correctly, I could also see it being fine if it just slowly fills up the prepared space for the next batch. Should I be moving things to GPU in __getitem__, and therefore in the data worker process? Or should it wait until the main process to move the whole batch at once? Or am I misunderstanding something about the whole procedure? Thank you!

It should be handled by main process. With pin_memory=True, there is not much overhead, and allows better memory management.

1 Like

Semi-related, what happens if I only use half the GPU with the main script, but then run that main script twice with two different sets of hyperparameters? Does that mess with PyTorch’s CUDA allocation (and lead to continually reallocating and what not)? Or will it just use half the GPU correctly and not cause much problems? Or is the specifics of the situation too complicated to give a short answer to this?

It should run fine. However, the performance might drop, if your GPU is already fully utilized using one script, since the processed would have to wait for each other.

1 Like