Error in CUDA driver while running a pytorch code

mhusseinsh · August 30, 2018, 3:57pm

Hello,

I have a code implemented in pytorch
whenever I run it I receive the following error

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=35 : CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
  File "main.py", line 510, in <module>
    main(parser.parse_args())
  File "main.py", line 410, in main
    model = torch.nn.DataParallel(model).cuda()
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 191, in _apply
    param.data = fn(param.data)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:74

My CUDA version is

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

ptrblck · August 30, 2018, 9:48pm

What GPU and NDIVIA drivers do you have?
Apparently the driver is too old.

mhusseinsh · August 31, 2018, 5:50am


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.88                 Driver Version: 375.88                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 0000:08:00.0     Off |                    0 |
| N/A   50C    P0   106W / 250W |  15645MiB / 16276MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 0000:0B:00.0     Off |                    0 |
| N/A   51C    P0   127W / 250W |  15645MiB / 16276MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  On   | 0000:0E:00.0     Off |                    0 |
| N/A   70C    P0   159W / 250W |  15649MiB / 16276MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  On   | 0000:11:00.0     Off |                    0 |
| N/A   55C    P0    61W / 250W |  15599MiB / 16276MiB |     74%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla P100-PCIE...  On   | 0000:16:00.0     Off |                    0 |
| N/A   27C    P0    36W / 250W |  15599MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla P100-PCIE...  On   | 0000:19:00.0     Off |                    0 |
| N/A   54C    P0   124W / 250W |  15645MiB / 16276MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla P100-PCIE...  On   | 0000:1C:00.0     Off |                    0 |
| N/A   33C    P0    31W / 250W |  15543MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla P100-PCIE...  On   | 0000:22:00.0     Off |                    0 |
| N/A   57C    P0    42W / 250W |  15599MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

ptrblck · August 31, 2018, 9:22am

You could try 384.66 for CUDA8.0 or a newer one for CUDA9.X.

thierry007 · October 31, 2018, 12:32pm

Hello, I have exactly the same issue when using pytorch 0.4.1 with:

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

NVIDIA-SMI 375.26 Driver Version: 375.26

I read last remark about the driver being to old.
Has someone on this forum experience upgrading NVIDIA and being able to use cuda 8 with pytorch 0.4.1 ?

Thanks!

Xiaoyu_Song · November 5, 2018, 8:25am

Hello,
I have the same proble, I’m using
pytorch 0.4.1
Python version 3.6.7

NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.148

Anyone know how to solve the porblem?
Thanks

ki2rin · November 25, 2018, 2:25pm

Same issue here with macOS 10.13.6, CUDA 9.2 and Python 3.6.

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /path/to/pytorch/aten/src/THC/THCGeneral.cpp:74

albanD · November 27, 2018, 9:06am

Hi,

You will need to upgrade your nvidia driver. Yours is too old for the cuda version you have.

ki2rin · November 27, 2018, 2:57pm

Are you 100% sure? I’ve just updated my nvidia driver to 410.130 through System Preference pane. And the version I’ve used previously was 396.148. Both versions give me the same error message.

albanD · November 28, 2018, 10:10am

Can you run CUDA samples properly?

ki2rin · December 2, 2018, 6:46am

Nope… Thanks for letting me know a simple diagnostics.
When I do the followings

~ $ cd ~/NVIDIA_CUDA-9.2_Samples/0_Simple/vectorAdd
~/NVIDIA_CUDA-9.2_Samples/0_Simple/vectorAdd $ make
~/NVIDIA_CUDA-9.2_Samples/0_Simple/vectorAdd $ ./vectorAdd

I get

[Vector addition of 50000 elements]
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!

What could I do then?
Thanks in advance.

EDIT
I just realized there is another source of the driver, the Web Driver. (link is here)
The installer of the Web Driver makes the pane in the System Preference. (named ‘NVIDIA Driver Manager’)
But the version of the Web Driver I installed is 387.10.10.10.40.105 which is lower than the one I’ve been previously using.
Of course, this time also I failed to run the CUDA samples properly.

In the meantime, I also tried CUDA 9.1 but with no luck.
It seems that CUDA with version equal or less than 9.0 is not supported in macOS 10.13.
Now I feel that making development environment of CUDA relative things in macOS is quite tough.

Wish somebody could help me on this issue.

ki2rin · December 2, 2018, 11:11am

The issue has been solved.

Actually, the Web Driver has updated my GPU Driver Version.

Looking back, I think the error message was very confusing.

error code CUDA driver version is insufficient for CUDA runtime version

CUDA driver version in the error message seems to be referring to GPU Driver Version in the CUDA Preferences pane. (in the image above)

Still, the word “CUDA runtime version” is quite confusing.

Thank you @albanD.