Cublas runtime error : library not initialized at /data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/THCGeneral.c:383

Dennis · March 26, 2017, 6:08pm

net = Net()
net = net.cuda()

input = Variable(torch.randn(1, 1, 32, 32))
input = input.cuda()
output = net(input)

Traceback (most recent call last):
File “/home/shijinzhu/!work_python/python_pytorch/demo001 pytorch_test/test.py”, line 73, in
output = net(input)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 202, in call
result = self.forward(*input, **kwargs)
File “/home/shijinzhu/!work_python/python_pytorch/demo001 pytorch_test/test.py”, line 52, in forward
x = F.relu(self.fc1(x))
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 202, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py”, line 54, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functions/linear.py", line 10, in forward
output.addmm(0, 1, input, weight.t())
RuntimeError: cublas runtime error : library not initialized at /data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/THCGeneral.c:383

Thank you

wasiahmad · March 30, 2017, 2:50am

I am also getting the same error when I run my code in multiple GPUs. But the error is not consistent, sometime I get it, sometime not.

Is there any workaround to get rid of this problem? @smth

Yufeng_Ma · April 20, 2017, 8:16pm

I am facing the same error. So is this related with machines that have multiple GPUs?

Yufeng_Ma · April 21, 2017, 3:21am

I think I’ve found the workaround. When we do .cuda(), we may specify the GPU device we want to load data or model to make sure they are on the same GPU. For example,

net = Net()
net = net.cuda( 0 )

input = Variable(torch.randn(1, 1, 32, 32))
input = input.cuda( 0 )
output = net(input)

Rohith_AP · May 1, 2017, 4:43am

I had also faced this issue even on single GPU.
I noticed that cublas samples required sudo permission to Initialize.
Also to avoid root permission, I removed the cache files in ~/.nv directory.
Hope this solution helps.

ShawnGuo · May 2, 2017, 4:15am

I’ve faced same problem. But, on my server, this problem is caused by that there is no enough memory on GPU devices the program is using. You may specify another GPU for your program by using torch.cuda.set_device(id_of_idle_device).
Hop this can help you.

ShawnGuo · May 2, 2017, 4:32am

I’m also facing the same issue. Although I removed the cache files in .nv directory, same error would be raised when running my code.

ws123 · June 8, 2017, 9:16am

@Rohith_AP, @ShawnGuo

“sudo rm -r ~/.nv” works for my 4-GPU machine to remove error below
RuntimeError: cublas runtime error : library not initialized at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/THCGeneral.c:394

FYI, it also solves error below
File “/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py”, line 40, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

Thanks!

vince62s · June 20, 2017, 6:48pm

Hello,
I am having the same issue and the fix above does not work.
I think it is related to this: https://github.com/torch/cutorch/issues/677

When trying with CUDA_VISIBLE_DEVICES=1
not using set_device throw the runtime error,
then trying torch.cuda.set_devices(2) throws an ordinal error, I thought torch was counting from 1.

any help ?

smth · June 22, 2017, 2:36pm

CUDA_VISIBLE_DEVICES is 0-indexed. PyTorch is also 0-indexed.

vince62s · June 22, 2017, 4:13pm

yes I found out also that we actullay need to use torch.cuda.device(x) with x o-indexed
and not set_device which seems not to work properly.

lifematrix · July 20, 2017, 9:42am

@Rohith_AP, @ShawnGuo

“sudo rm -f ~/.nv” works for me. It has troubled me for a long time.
Thank a lot!

Steven

oneTaken · October 14, 2017, 5:46am

I’m finetuning the vgg19_bn on my own dataset, and I faced the same problem too.

With the insructions above, I removed the .nv file by command
sudo rm -rf ~/.nv

However, when I run the gpu-version CNN, the error shows again. And I found the .nv file appeared again. And I changed the batch_size from 64 into 32, the code runed well. The same solution as above @ShawnGuo . Thasnks a lot. By a way, when the memory is not enough for the code, should it raise the error? Can it give a more accurate error infomation? @smth .

zjtgit · October 28, 2017, 6:39am

Thanks! It worked for me.

guoqiang_Wei · November 11, 2017, 11:14am

tthx, it works for me,
BTW, can you explain why it works in details?

tiantong · December 20, 2017, 12:11pm

when I use pytorch 0.3, it works. But when I use 0.4 compiled from master, my code throws this error. remove nv doesn’t work.

xypan1232 · April 30, 2018, 12:17pm

@tiantong, could I ask how you fixed this problem, it also happens to me when upgrading to pytorch v0.4? thanks

Brando_Miranda · May 2, 2018, 4:34am

whats the source of this problem?

Brando_Miranda · May 2, 2018, 4:36am

is there not a way to set these indices globally once for everything?

Brando_Miranda · May 2, 2018, 2:59pm

I think:

export CUDA_VISIBLE_DEVICES=$i

is what Im looking for.

Cublas runtime error : library not initialized at /data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/THCGeneral.c:383

input = Variable(torch.randn(1, 1, 32, 32)) input = input.cuda() output = net(input)

input = Variable(torch.randn(1, 1, 32, 32))
input = input.cuda()
output = net(input)