Hi guys,
I’m troubling with the cuda version for pytorch. Lately I’ve changed my working environment from Titan Xp to RTX2080ti. Then the same code went into problems like this:
File "train.py", line 183, in train
outputs = model(inputs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/data/lane-detection/lib/network_zoo/DANet.py", line 112, in forward
x = self.head(c4)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/data/lane-detection/lib/network_zoo/DANet.py", line 153, in forward
sa_feat = self.sa(feat1)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/data/lane-detection/lib/network_zoo/DANet.py", line 46, in forward
energy = torch.bmm(proj_query, proj_key)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:450
^CException ignored in: <module 'threading' from '/usr/lib/python3.5/threading.py'>
I’m using 8 gpus with DataParallel module, after looking into this problem for a while, I figured out that RTX2080ti requires cuda 10.
However, the thing is that I did install cuda 10, and I verified this with nvcc version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
But in a ipython kernel, torch.version.cuda
shows:
In [2]: torch.version.cuda
Out[2]: '9.0.176'
How can I do with it?
Thanks in advance.