Dataset location (RuntimeError: Caught RuntimeError in DataLoader worker process 0.)

muammar · July 17, 2022, 9:28pm

I have a custom dataset as shown below:

class MyDataset(Dataset):
    def __init__(self, X, y):

        self.X = X
        self.y = y

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]


training_latent_dataset = MyDataset(
    torch.cat(training_latent), torch.cat(training_labels)
)

train_latent_loader = torch.utils.data.DataLoader(
    training_latent_dataset, batch_size=batch_size, shuffle=True, **kwargs
)

If the tensors are already on the GPU and I try to iterate over the Dataset, I get this error:

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/muammar/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/muammar/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/muammar/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_356729/3661499099.py", line 11, in __getitem__
    return self.X[idx], self.y[idx]
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

If I move the tensors from GPU to CPU when instantiating the MyDataset class, then everything works well:

training_latent_dataset = MyDataset(
    torch.cat(training_latent).cpu(), torch.cat(training_labels).cpu()
)

train_latent_loader = torch.utils.data.DataLoader(
    training_latent_dataset, batch_size=batch_size, shuffle=True, **kwargs
)

Is that the expected behavior for the tensors to be in the CPU instead of the GPU? I could not find anything about that in the documentation.

ptrblck · July 17, 2022, 10:13pm

Yes, usually you would load the data on the CPU and push it to the GPU inside the training loop in order to overlap the data loading with the actual model training.
However, if you want to load CUDATensors directly, check the CUDA multiprocessing docs to avoid re-creating a CUDA context, which raises your error.

muammar · July 17, 2022, 10:26pm

Hi @ptrblck, thanks for your fast reply and the link. That makes sense. I was very confused when I faced this.

N_N_miko · June 10, 2024, 2:35am

Well, since the poster’s problem has solved, I’ll not asking other solution of this. But I still have some problem, I did load my tensors to GPU in the training loop, but what it cost me was some kind of weird pulse effect, where the DataLoader’s workers are trying to put the tensor to gpu over and over again during training. When I was checking my GPU usage, it just increase a bit and then droped and back again.
I don’t know how am I going to solve this problem, if you have any idea, please share to me, huge thanks to you.

ptrblck · June 10, 2024, 1:54pm

I don’t understand your issue and what unexpected “pulse effect” is observed. The DataLoader will prefetch batches in the background if num_workers>0 is used. If you are moving these batches to the GPU inside the DataLoader, an increase in memory is expected.

N_N_miko · June 11, 2024, 5:03am

What I mean was the DataLoader will only push those data to GPU once a time, so when I check my GPU usage, it will increase to 78 and then decrease to 0 again and again, which slows the training process down.
I’ve solved it now by just simply use another DataLoader and load all data to GPU first before the training happen, which is still kind of weird solution.
Sorry I didn’t make myself clear, English is not my mother language and I’m a slow learner, haha.
If you still can’t get what I mean, Here is the code that caused this: “src_data, tgt_data = batch[0].to(device), batch[1].to(device)”, I push it to the GPU inside the loop as you mentioned. I use num_workers=16 in my case, I wish it could load the data to GPU while the model is doing calculation, but it didn’t work in my case, I don’t know if I didn’t load my data in the right way.