Num_workers > 1 mixes samples and labels (h5py dataset)

rumo · May 25, 2020, 3:05pm

Hi everyone,
I have a question regarding the understanding of num_workers, I couldn’t answer it myself.

I have a data set that contains image data and the corresponding masks (segmentation task). The file format is HDF5.
I wrote a custom data set class that loads the data and use a data loader for loading the data in batches.
To visualize the data, I wrote a function that grids the image data and the corresponding masks and overlays the two, so that I can see the masks on the image. For that, I directly pass a batch coming from the dataloader to my “gridding function”.

As long as I use num_workers = 0 or 1, everything works as expected.
For num_workers > 1, it seems like my training samples and the corresponding masks are shuffled around (some images appear in several batches but with different masks, sometimes there’s no image at all, just the mask). I uploaded an example were you can see that the first two samples in the batch are identical but the mask (blueish parts) are different (and both wrong). This never happens if I set num_workers to 0 or 1.

I read that there are some problems with HDF5 datasets and multitasking, but I’m not sure if my problem relates to this or if there is something else going on that I’m unaware of.

Tech specs:

Ubuntu 18.04
i7-4720HQ CPU
16 GB RAM
no CUDA, just CPU (for now)

Thank you for your help!

Kushaj · May 25, 2020, 5:29pm

If you are using a single dataloader that loads both the image and the mask num_workers won’t matter. When multiple workers are used, each worker works on a different __get_item__.

rumo · May 26, 2020, 9:09am

Hello Kushaj,
thank you for your reply! I took another look and could actually solve my problem by following piojanu’s workaround. When I open the HFD5 file within getitem , and not from the init as I did before, everything works as expected. I had not realized, that my problem could be caused by that.