Hey all,

I’m running a couple of models on a multi-GPU system. When I attempt to use from a GPU other than device 0 while running another model on device 0, I get the following error; however the saving functionality works perfectly for all models running on GPU 0. I’ve looked into the documentation on serialization semantics and I seem to be following the recommended practices, and the default pickle settings also seem to be okay for this use case as well. Does anyone have any insight into this problem?

Link to source for

  File "/home/adamvest/", line 156, in save_model, "%s/weights.pth" % self.args.out_folder)
  File "/home/adamvest/lib/python/torch/", line 120, in save
    return _save(obj, f, pickle_module, pickle_protocol)
  File "/home/adamvest/lib/python/torch/", line 192, in _save
RuntimeError: cuda runtime error (46) : all CUDA-capable devices are busy or unavailable at /b/wheel/pytorch-src/torch/csrc/generic/serialization.cpp:38

have you tried to switch the current device with:

with torch.cuda.device(1):
1 Like

Yes, I was able to work around my issues using this or moving the model to cpu before saving. Still not sure of the root cause though.