RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

Jorge_Garcia · September 15, 2022, 2:46pm

Hello,

I’m getting “RuntimeError: Expected a ‘cuda’ device type for generator but found ‘cpu’” error when I try to iterate over my dataloader created as follows:

transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize(0.5, 0.5),
                           ])
dataset = dset.MNIST(root=dataroot, train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                         shuffle=True, num_workers=workers)

The problem solves if I turn shuffle to False but I would like to mantain it to True. Solutions I’ve found imply to change some pytorch code but I would like to avoid it.

Thanks!

srishti-git1110 · September 15, 2022, 3:05pm

Hi,
Can you please see if this works -

torch.utils.data.DataLoader(
    ...,
    generator=torch.Generator(device='cuda'),
)

Jorge_Garcia · September 15, 2022, 3:26pm

It works!

Thank you very much!

ptrblck · September 15, 2022, 8:03pm

I don’t understand what the issue in the posted code snippet was as it’s working for me locally and it’s not even using the GPU, so which part of the code raised the error?

@srishti-git1110 have you seen this error before being raised in a DataLoader using the CPU only?

Jorge_Garcia · September 15, 2022, 9:03pm

I’m sorry I didn’t explain myself properly. The problem only arised when using GPU as device and worked fine when working on CPU.

srishti-git1110 · September 16, 2022, 4:54am

Hi @ptrblck ,
No, I’ve never run into such an error while using CPU.
Consequently, yes, the posted code doesn’t seem to produce any error.

I just assumed the OP is facing the error when using GPU, hence posted that as the solution. Sorry for not being clear - should’ve mentioned it there.

ptrblck · September 16, 2022, 5:21am

Not at all. My post wasn’t any criticism as you’ve guessed it perfectly right and @Jorge_Garcia clarified that indeed the GPU was used.

I was just concerned if this might be a known issue of raising CUDA errors when a CPU-only DataLoader is used, but it turns out the code was missing some parts.

Dmitriy_Lamzin · July 7, 2023, 3:13pm

Hello here. I faced the same issue. But looks like the suggestion above will not work:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-4aa41058-b8da-44ab-8b2d-453f51f9b1ec/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
    575             shared_rng = torch.Generator()
    576             shared_rng.manual_seed(self._shared_seed)
--> 577             self._dataset = torch.utils.data.graph_settings.apply_random_seed(self._dataset, shared_rng)
    578         self._dataset_kind = loader._dataset_kind
    579         self._IterableDataset_len_called = loader._IterableDataset_len_called

/local_disk0/.ephemeral_nfs/envs/pythonEnv-4aa41058-b8da-44ab-8b2d-453f51f9b1ec/lib/python3.8/site-packages/torch/utils/data/graph_settings.py in apply_random_seed(datapipe, rng)
    151 
    152     for pipe in random_datapipes:
--> 153         random_seed = int(torch.empty((), dtype=torch.int64).random_(generator=rng).item())
    154         pipe.set_seed(random_seed)
    155 

/local_disk0/.ephemeral_nfs/envs/pythonEnv-4aa41058-b8da-44ab-8b2d-453f51f9b1ec/lib/python3.8/site-packages/torch/utils/_device.py in __torch_function__(self, func, types, args, kwargs)
     60         if func in _device_constructors() and kwargs.get('device') is None:
     61             kwargs['device'] = self.device
---> 62         return func(*args, **kwargs)
     63 
     64 # NB: This is directly called from C++ in torch/csrc/Device.cpp

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

Basically here you can see that the generator is initilised inside function right before generating stuff

453f51f9b1ec/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
    575             shared_rng = torch.Generator()
    576             shared_rng.manual_seed(self._shared_seed)
--> 577             self._dataset = torch.utils.data.graph_settings.apply_random_seed(self._dataset, shared_rng)

In my case it looks more like a bug in the back-compatibility of data pipes from torchdata with DataLoader

jamestang0219 · September 5, 2023, 4:17am

same problem, it seems the generator will always be initialized using cpu

Nikita_Artemenko · December 25, 2023, 9:29am

Alright.
If you define loader like that:

    dataloader = DataLoader(
        image_dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=True,
        shuffle=True,
        generator=torch.Generator(device='cuda:0'),
    )

And try to use next(iter(dataloader))
You get Error:

/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py in __iter__(self)
    163         else:
    164             for _ in range(self.num_samples // n):
--> 165                 yield from map(int, torch.randperm(n, generator=generator).numpy())
    166             yield from map(int, torch.randperm(n, generator=generator)[:self.num_samples % n].numpy())
    167 

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Seems like instead of numpy there should be .cpu() method in sampler.py

Otherwise, if we don’t specify generator or use ‘cpu’ device it produces error (which is the topic of discussion).

So, what should I do to solve the problem?

fdila · January 7, 2024, 9:05am

/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py in __iter__(self)
    163         else:
    164             for _ in range(self.num_samples // n):
--> 165                 yield from map(int, torch.randperm(n, generator=generator).numpy())
    166             yield from map(int, torch.randperm(n, generator=generator)[:self.num_samples % n].numpy())
    167 

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I’m having the same error, did you manage to find a solution?

sc0v0ne · March 27, 2024, 1:13pm

Thank you very much!!!

Geremia · May 31, 2024, 10:49pm

Thanks, that worked.
I wonder why torch.set_default_device(torch.device('cuda')) didn’t tell torch.utils.data.DataLoader to use cuda. I suppose it only works for setting what tensors should use.
Is there an analogous command to set_default_device() but for DataLoaders?

raghavm1 · June 20, 2024, 3:25am

I’m getting the same error but this doesn’t seem to work for me

Not sure why. I’m using PyTorch 2.3.1. Could this be related to the version of the library?

DeerFreak · June 20, 2024, 1:13pm

I think you might need to update cuda. This fixed it for me

raghavm1 · June 21, 2024, 6:05pm

Just solved this issue by removing the line -

torch.set_default_device('cuda')

ptrblck · June 21, 2024, 8:48pm

The used CUDA version (shipped as part of the PyTorch binaries or a local CUDA toolkit if you’ve built PyTorch from source) won’t change the device in your PyTorch code.