RuntimeError: enforce fail at context_gpu

Vimos · June 22, 2019, 4:29am

When I use dataloader of VisionDataset together with Detectron pretrained models, if I load the model first, then dataloader fails with the following error, otherwise the dataloader runs OK.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-0923c73e3ef2> in <module>
      1 
----> 2 for batch in train_loader:
      3     display(to_img(batch[0][0]))
      4     im_name = batch[2][0]
      5     premise, hypothesis, annotator_labels, gold_label = batch[1][0]

~/anaconda3/envs/rcqa/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    580                 self.reorder_dict[idx] = batch
    581                 continue
--> 582             return self._process_next_batch(batch)
    583 
    584     next = __next__  # Python 2 compatibility

~/anaconda3/envs/rcqa/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    606                 raise Exception("KeyError:" + batch.exc_msg)
    607             else:
--> 608                 raise batch.exc_type(batch.exc_msg)
    609         return batch
    610 

RuntimeError: Traceback (most recent call last):
  File "/home/vimos/anaconda3/envs/rcqa/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/vimos/anaconda3/envs/rcqa/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "<ipython-input-2-49e9c213df04>", line 88, in __getitem__
    img = self.transform(img)
  File "/home/vimos/anaconda3/envs/rcqa/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/home/vimos/anaconda3/envs/rcqa/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 92, in __call__
    return F.to_tensor(pic)
  File "/home/vimos/anaconda3/envs/rcqa/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 79, in to_tensor
    img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
RuntimeError: [enforce fail at context_gpu.cu:322] error == cudaSuccess. 3 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653114079/work/caffe2/core/context_gpu.cu:322: initialization error

Any thoughts?

ptrblck · June 22, 2019, 6:48pm

Are you preloading the data (and target) tensors onto the GPU before passing them to Dataset?
This might cause multiple initializations of the CUDA context, which might yield this error.
You could try to set num_workers=0 in your DataLoader, use CPU tensors, or try to set the spawn method for multiprocessing.

Vimos · June 24, 2019, 9:40am

Thanks, setting num_workers=0 works well.