Doing Data Augmnetation in Parallel

Hi, I have a model that I have to do a lot of Data Augmentations in 3D that is time-consuming. I want to do it in parallel. How can I do it?
I read the other topics in the forum but I didn’t find the answer to my question.

Another question that I have I need to do some intensity augmentations like changing contrast, hue, and saturation for 3d images. I can not use the command that we have in PyTorch. So, if anyone knows any package that I can use for NumPy n-array that helps to change pixel intensity really helps me.

Thanks for your help.

If you are using a DataLoader with num_workers>0, multiple workers will be used to create each batch in the background.

I don’t know which lib would provide all functionality you need for the 3D data augmentation, but maybe a medical lib could be helpful, such as MONAI or torchio.

1 Like

Thanks for your answer. So, you mean that when I set num_workers=4, then each batch is divided by 4 and augmentations are done on this 4 subsets in parallel?

Regarding MONAI, and torchio I checked them but I didn’t find any pixel intensity transformations.
Thank you for your help.

No, each worker will create an individual batch.
So 4 workers will create 4 batches simultaneously, add these batches to a queue and try to prefetch the next one.

There was some effort to use multiple workers for a single batch creation, but I don’t know the status of it.

Ok, thank you for your answer.

Hi, Mjavan. TorchIO does support NumPy arrays:

import numpy as np
import torchio as tio

array = np.ones((1, 2, 2, 2))  # (C, D, H, W)
print(array)
transform = tio.RandomNoise()
transformed = transform(array)
print(transformed)

Output:

[[[[1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]]]]
[[[[0.64996007 0.86066213]
   [1.01217787 0.97101844]]

  [[1.09215886 0.98876163]
   [0.94828662 0.77317567]]]]