Multiple workers dataloader with big input files

Hello evereybody,

I am working on a autoencoder for 3D medical imaging. My data input are big (around 150150150 pixels). I have 517 different images (around 150 Go) in a hard drive. I have 4 Tesla K 40 GPU. I am working inside a docker and I put the options --ipc=host. I design my own DataSet classes and I use the DataLoader of Pytorch.

I am trying to improve the loading of the data because currently it is taking 80% of the calculation time.

When the numbers of workers is equal to 0, the loading time of my data is between 1 and 2 seconds (I tested with numpy.load or torch.load, it doesn’t make any differences).

But when I put num workers > 0, the loading time of the data is between 20 and 40 seconds. My hypothesis is that because of the size the data, pickling the array between the workers slow down the loading function.

Is there any way to load huge data with multiple workers and not having such a slown down ?


Hi @TheoEst, you might want to take a look at the ImagesDataset in TorchIO.