Error while using GPU

My network: (cuda = 9, pytorch = 1.1.0)

input tensor -> GRU(it works) -> FullConnect(that was when pytorch crashed) --x--> output

Traceback (most recent call last):

  File "main.py", line 42, in <module>
    mod_train.train(network, device, dataloader, optimizer, loss_fn, i)
  File "/root/code/GPU/mod_train.py", line 20, in train
    outputs = model(inputs)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/code/GPU/mod_network.py", line 47, in forward
    input = self.fc_0(input[:, -1, :].cuda())
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1406, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCBlas.cu:259

(Is there any doc explaining special requirements for GPU training?)

How to solve this problem ?

Hi,

Could you give more precision on the size of the input Tensor for your linear layer?
Also can you make a small code sample that reproduces the issue please?

nn.Linear always crashs

m = torch.nn.Linear(2, 2)
m.cuda()
input = torch.randn(10, 2)
input = input.cuda()
m(input)

Hi,

Are you using google colab by any chance? In that case, this is a known issue and it is tracked here: https://github.com/pytorch/pytorch/issues/29096
If not, how did you install pytorch and cuda on your machine if not?

Thanks.


At last I found out that I made a stupid mistake.
The server had both CUDA 10 and CUDA 9 on it.
And I forgot to switch to the correct version.


ln -snf /usr/local/cuda-9.0/ /usr/local/cuda

1 Like