Unable to fix RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

While it seems like this issue is brought up quite frequently, I was unable to find an answer in online forums. Currently I am trying to load a model from a .pth file and fine tune it using a dataset from huggingface. However, it seems that upon loading in the model, this error keeps occuring.

I’ve tried putting torch.multiprocessing.set_start_method('spawn') in the code but then another error appears that the context has already been set. Upon some debugging it seems that the context is set when the load_dataset function from the datasets library is called. I am currently using the accelerate library for training.

Check if the dataset is moving the samples to the GPU internally and if so, keep them on the CPU. Alternatively, reduce the num_workers=0 and use the main thread to move the data to the GPU if you really want to move it to the GPU inside the dataset.