Non thread-safe dataset with DataLoader

I have a really big dataset containing images of cells in a really weird format. I am using the library bio-formats (which works in Java) to read the images, but is not thread-safe. They are advising to create a new image reader per instance.

For now, if I don’t do anything specific, when num_worker > 0, nothing happens (it seems to be blocked in an infinite loop).

I have torch 1.4.0, and I tried torch.multiprocessing.set_start_method(“forkserver”/“spawn”) but to no avail, it gives the following error:

... in _javabridge.JB_Object.__reduce_cython__()

TypeError: no default __reduce__ due to non-trivial __cinit__

Is there any way of having separate instances in the dataloader?

Hi,

Note that when you use num_worker > 0, we use multiprocessing and not multithreading to load the different objects on the Dataset.
So you want to make sure that your java loader works fine in the child process .