RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/THCCachingAllocator.cpp:507

When i use DataLoader, I encountered the problem described as title.
My code is as follows:

    def __getitem__(self, index):
        record = self.video_list[index]
        if self.is_train:
            data = self.feature_list[index]
            data = self.feature_list[]
        print('sample the video size is {}'.format(data.size()))
        # Here is the error
        video_feature = data.mean(dim=0, keepdim=True).view(-1)
        return video_feature

The self.feature_list is the list of tensor saving my train data.
And there is the error information:

sample the video size is torch.Size([10, 2048])
    train(model, train_loader, criterion, optimizer, ep)
  File "", line 110, in train
    for i, (input, target) in enumerate(train_loader):
  File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/", line 336, in __next__
    return self._process_next_batch(batch)
  File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/", line 106, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/", line 106, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/media/data/kmy/MBA/", line 54, in __getitem__
    video_feature = data.mean(dim=0, keepdim=True).view(-1)
RuntimeError: CUDA error (3): initialization error

Anyone have any idea? Thanks a lot!

Is your Dataset working without the DataLoader?
Try to call dataset[0] and see if you’ll get the same error.
Also, is data stored on the GPU?

Could you check, if CUDA works at all:

import torch
x = torch.randn(10, 10, device='cuda')

haha, I see u again. Really thank you!
I found the reason later. You are right, I stored data on the CPU.

I’m glad it’s working.
However, I would like to make sure to understand the issue properly.
Usually you would lazily load the data onto the CPU in your Dataset using a DataLoader.
Then in the training loop you would push the data and target to the GPU.
Using this approach you will use minimal amount of your (limited) GPU memory to store the data.

Otherwise if you load all data onto the GPU you are wasting GPU memory, which might limit your model size etc.
This approach might still be alright, if your model is quite small, but I would generally not recommend it.


I noticed the problem. And I have modified my code.
Now my code is as blow,

model = nn.DataParallel(model).cuda()
  for i, (input, target, num_frames) in enumerate(train_loader):
        target = target.cuda(async=True)  
        # do some processing on the input 
        output = model(input.cuda(), seg_num_list)

Is that right?Actually I noticed that sending input without cuda() to the model can work well when using DataParallel. I guess DataPerallel move input to GPUs automatically Inside its interior ?

The code look good.
Have a look at @Thomas_Wolf’s blog post about DataParallel to get some more information what’s actually going on under the hood.

OK. Thank you very much ! :)

hi @ptrblck_de,
i am new to python and pytorch specifically…
i have the same error but it seems that cuda doesn’t work at all because also when running the commands above i get the same issue, do you have any idea how to solve this one?