Cuda tensors in forked subprocess

Hi all,

My code works fine in PyTorch v1.0.0. However, in PyTorch v1.6.0, the line torch.set_default_tensor_type(torch.cuda.DoubleTensor) causes a Cannot re-initialize CUDA in forked subprocess error. I noticed that this is due to the fork call when using num_workers > 0 in the PyTorch dataloader. My guess is that its because of the subprocesses loading data directly onto the GPU. So one possible fix I found on the forums was to use spawn or forkserver start methods instead, however these methods are quite slow compared to using fork. Is there a way to benefit from the speedup of num_workers > 0 while also keep the default tensor type as Cuda tensors?

Code that can reproduce the error:

import torch
import torchvision
import torchvision.transforms as transforms


transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader =, batch_size=4,
                                          shuffle=True, num_workers=2)


I’m afraid this is not possible to do as setting the cuda type causes cuda to be initialized and the limitation with fork and cuda is due to the cuda driver itself, not pytorch. So there is little we can do.

Note that in general, I would advise against setting the default type on cuda as it might make simple ops slower than they should (in particular when default Tensors are created).

1 Like