How to do 3d data augmentation in parallel on the gpu?

I have a lot of 3d data and need to do various data augmentation. I want to do data augmentation in parallel on the gpu, but it seems that pytorch does not allow gpu operation in the dataloader. Is there any good way?

You need to set num_workers=0 to be able to use dataloader in gpu.

Thanks for your reply, I know your meaning, but I want to do data augmentation in parallel on the gpu. If I set num_workers=0, I can only do data augmentation on single gpu.

There is no such a think like multi-gpu dataloader.
The most you can do is to use dataloader as datareader.
Then set a nn.module for preprocessing. Apply data parallel over it and do any preprocessing you need.
I doubt it were efficient but you can try. You should have a very dense preprocessing to feel the advantage of spliting the data in several gpus.
Another option is you to use a single gpu for preprocessing and another one for the main workload.

Hi, I am having the same issues. Did you find a way to implement GPU data preprocessing?

You could also try something new:
Use some pytorch layers for the data augmentation. But set them to requires_grad=False. Then each n steps you could change the properties of the augmentation randomly.
This would support DataParallel and thus multiple GPUs.

1 Like