RuntimeError:Cuda error: initialization on dataloader

yoelt11 · September 29, 2022, 5:57pm

For those having the following error:

RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

when looping through the dataset, (or calling 'data = self._next_data()) e.g:

for i, (x, y) in enumerate(data_loader):
     ...

it might be the case that some of the tensors inside of the dataset are already in the gpu, which causes this error.

As a way to debug this, you can print in your dataset class:

print(x.get_device())
print(y.get_device())

to see which device your tensors are.

One thing to keep in mind is that when you save a tensor (torch.save()), also the device is saved, such when it is loaded again, it is loaded directly to the device it was saved on.

so in the end you should have something like this:

# in data loader tensors should be in cpu
for i, (x, y) in enumerate(data_loader):
   x = x.to(device) # where device is cuda
   y = y.to(device) # where device is cuda