Using Dataloader with serialized cuda tensors

Constructing feature vectors for my current dataset of Portable Executable files is rather slow, so my current workaround is to create and save (using pickle) the feature vector for each file and load them using a custom Pytorch Dataset. I noticed that a significant bottleneck was moving these vectors from CPU to GPU each time they were loaded (using a Pytorch Dataloader).

I tried saving and loading the cuda tensors instead but got the error:

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

I did some searching and found that this line should be used to handle cuda multiprocessing:


However, if I insert this into my code or the Dataloader code, I always get the following error:

RuntimeError: context has already been set

So my questions are:

  1. Is there a way to make Pytorch Dataloaders work with serialized cuda tensors?
  2. Am I fundamentally misunderstanding something?

see warning at

It seems like the code for Dataloader already makes the replacement of importing torch.multiprocessing as multiprocessing. It’s not clear to me even after reading the article what needs to be modified in the Dataloader code for this to work.

hmm did you read the warning I linked above? You need to put that under if __name__ == '__main__'

I understand but regardless of where I insert that it complains:

RuntimeError: context has already been set

I’m assuming it’s an issue with the import structure of my codebase where the context gets set during an import.