Multiple parallel DataLoaders cause exception

Noam_Gat · August 26, 2020, 8:48pm

Hello,

I’m training one model on two different sources of data.
My training loop is essentially

for …
train one batch from source A
train one batch from source B

Note that the code to train from the two sources isnt exactly the same, but the model is.

When I initialize both DataLoaders with num_workers, I get the following execption:

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

Using the spawn method causes other side effects, so I would prefer to stay in this mode.
It there a way to get this to work?

Noam_Gat · August 26, 2020, 8:57pm

The problem was different - one of the data loader uses cuda internally (because the data goes through a cleaning network before being returned). This was the problem. Is there a way to a void ‘spawn’ method in this case?

SimonW · August 26, 2020, 11:14pm

You can’t use CUDA with fork. But you can set the ctx for just one data loader by specifying multiprocessing_context='spawn' when creating the data loader.

Noam_Gat · August 27, 2020, 6:22am

Thanks! Didn’t know about the per-loader context flag. Much cleaner solution.