Modify DataLoader to make request to CPU

Tenese · June 24, 2020, 3:41pm

Hello, I’m new to PyTorch. I want to modify the Data Loader to send a request to CPU to do data augmentation on a batch then send back to the GPU for training. However, I do not know how to approach this issue. Is there any tutorial on this?

Any help is appreciated. Thank you.

Nikronic · June 24, 2020, 11:28pm

Hi,

Dataloaders by default run all operations in CPU unless you explicitly do otherwise. For instance, you can combine your augmentations using torchvision.transforms then all generated images have been produced in CPU.

Tenese · June 25, 2020, 10:37am

Hello, will the augmentation in CPU run in parallel with the training process in GPU?

Thanks for the reply!

Nikronic · June 25, 2020, 10:44am

Yes exactly, that is why in Dataset definition, there is a function called __getitem__ which only returns a single item of dataset after augmentation. Then dataloaders take advantage of this and can run this method in parallel. And after this, you convert tensors in batch, obtained from dataloader to tensors in GPU so this pipeline will be held.

Tenese · June 25, 2020, 1:04pm

Thanks for the explanation.