Strange CUDA error with pytorch 1.1.0 CUDA 9

alwynmathew · November 9, 2019, 10:22am

Strange error while running on the server with pytorch 1.1.0, cudatoolkit 9.0.

File "main.py", line 61, in main
    model = Model(args)
  File "train.py", line 63, in __init__
    norm_layer=None).to(self.device)
  File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 386, in to
    return self._apply(convert)
  File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 199, in _apply
    param.data = fn(param.data)
  File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 384, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: unknown error

I really like to know the source of this error? Any help is appreciated.

ptrblck · November 10, 2019, 3:22am

Which GPU are you using?
Does your code work with the latest PyTorch release (1.3.1) or with CUDA10.1?

alwynmathew · November 10, 2019, 5:53am

It worked on PyTorch 1.1.0 with CUDA 10 and PyTorch 1.0.0 with CUDA 9 but not on PyTorch 1.1.0 with CUDA 9. I wonder what exactly is the issue?

GPU: GTX 1080Ti

ptrblck · November 10, 2019, 6:11am

I’m not sure, what the error might be pointing to.
Do you necessarily need to use PyTorch 1.1.0 or could you upgrade?
The error might have been fixed in the latest release, so that debugging it might not be worth the effort.

alwynmathew · November 10, 2019, 6:23am

Our server still runs Nvidia Driver Version 384.130, thus I can max go up to CUDA 9. I downgraded Pytorch 1.1.0 to Pytorch 1.0 with CUDA 9. Thank you.

On a different note, What is the safest way to update Nvidia driver on a Ubuntu 16.04 server with around 20 active users at any instance without disturbing any running processes? Because whenever I have try to update Nvidia driver on my Ubuntu 16.04 workstation, something or the other breaks.