Strange error while running on the server with pytorch 1.1.0, cudatoolkit 9.0.
File "main.py", line 61, in main
model = Model(args)
File "train.py", line 63, in __init__
norm_layer=None).to(self.device)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 386, in to
return self._apply(convert)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 199, in _apply
param.data = fn(param.data)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 384, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: unknown error
I really like to know the source of this error? Any help is appreciated.
I’m not sure, what the error might be pointing to.
Do you necessarily need to use PyTorch 1.1.0 or could you upgrade?
The error might have been fixed in the latest release, so that debugging it might not be worth the effort.
Our server still runs Nvidia Driver Version 384.130, thus I can max go up to CUDA 9. I downgraded Pytorch 1.1.0 to Pytorch 1.0 with CUDA 9. Thank you.
On a different note, What is the safest way to update Nvidia driver on a Ubuntu 16.04 server with around 20 active users at any instance without disturbing any running processes? Because whenever I have try to update Nvidia driver on my Ubuntu 16.04 workstation, something or the other breaks.