CUDA_ERROR_NOT_INITIALIZED when setting num_workers > 0 in DataLoader running with CUDA

I’m getting an issue F tensorflow/stream_executor/cuda/] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error when calling torch geometric DataLoader (which is inherited from when setting num_workers > 0 when running using CUDA (e.g. CUDA_VISIBLE_DEVICES=0 python After that, the script failed with DataLoader worker (pid(s) 31890) exited unexpectedly. Any explanation/solution? I read that the most common solution is to use num_workers = 0 but I want to parallelise the dataloading since reading the dataset is the current bottleneck (my dataset is saved in TFRecords). Thanks

The error seems to be raised from TensorFlow and I would guess that your use case tries to re-initialize a CUDA context and thus might break. Try to use the spawn method and it might work.

Hey @ptrblck, thanks for replying!
Do you mean calling torch.multiprocessing.set_start_method('spawn') before calling the function?
If I do something like this:

if __name__ == "__main__":

It would throw an error:
I tensorflow/stream_executor/cuda/] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

The initial error seems to be solved and TF fails to allocate device memory now.
I guess you might want to change TF’s behavior to allocate memory if needed and not to grab all device memory, but the TF discussion board might be a better place as you would find the experts there. :wink:

Ok I’m trying to sort out that issue now :slight_smile:. In the case that the GPU memory is fully occupied because the model takes a lot of memory, can I still use GPU to run multiprocessing in dataloader? Is there any equivalent way like where we can fetch and the process happens in CPU?

The DataLoader uses the Dataset.__getitem__ to fetch each sample which would use the CPU by default unless you are explicitly using the GPU. PyTorch will not use your GPU(s) by default.