Unable to fix RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Dave1 · August 27, 2024, 11:21am

While it seems like this issue is brought up quite frequently, I was unable to find an answer in online forums. Currently I am trying to load a model from a .pth file and fine tune it using a dataset from huggingface. However, it seems that upon loading in the model, this error keeps occuring.

I’ve tried putting torch.multiprocessing.set_start_method('spawn') in the code but then another error appears that the context has already been set. Upon some debugging it seems that the context is set when the load_dataset function from the datasets library is called. I am currently using the accelerate library for training.

ptrblck · August 28, 2024, 2:06pm

Check if the dataset is moving the samples to the GPU internally and if so, keep them on the CPU. Alternatively, reduce the num_workers=0 and use the main thread to move the data to the GPU if you really want to move it to the GPU inside the dataset.