Modify DataLoader to make request to CPU

Hello, I’m new to PyTorch. I want to modify the Data Loader to send a request to CPU to do data augmentation on a batch then send back to the GPU for training. However, I do not know how to approach this issue. Is there any tutorial on this?

Any help is appreciated. Thank you.


Dataloaders by default run all operations in CPU unless you explicitly do otherwise. For instance, you can combine your augmentations using torchvision.transforms then all generated images have been produced in CPU.

Hello, will the augmentation in CPU run in parallel with the training process in GPU?

Thanks for the reply!

Yes exactly, that is why in Dataset definition, there is a function called __getitem__ which only returns a single item of dataset after augmentation. Then dataloaders take advantage of this and can run this method in parallel. And after this, you convert tensors in batch, obtained from dataloader to tensors in GPU so this pipeline will be held.

Thanks for the explanation.