Hello, I’m new to PyTorch. I want to modify the Data Loader to send a request to CPU to do data augmentation on a batch then send back to the GPU for training. However, I do not know how to approach this issue. Is there any tutorial on this?
Dataloaders by default run all operations in CPU unless you explicitly do otherwise. For instance, you can combine your augmentations using torchvision.transforms then all generated images have been produced in CPU.
Yes exactly, that is why in Dataset definition, there is a function called __getitem__ which only returns a single item of dataset after augmentation. Then dataloaders take advantage of this and can run this method in parallel. And after this, you convert tensors in batch, obtained from dataloader to tensors in GPU so this pipeline will be held.