it works correctly when l run it for the first time. However when l try to run it for a second time l got the following error :
Traceback (most recent call last):
File "crnn_main.py", line 200, in <module>
cost = trainBatch(crnn, criterion, optimizer)
File "crnn_main.py", line 183, in trainBatch
preds = crnn(image)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/ahmed/crnn/models/crnn.py", line 78, in forward
conv = utils.data_parallel(self.cnn, input, self.ngpu)
File "/home/ahmed/crnn/models/utils.py", line 12, in data_parallel
output = model(input)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/modules/container.py", line 64, in forward
input = module(input)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 237, in forward
self.padding, self.dilation, self.groups)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/torch/nn/functional.py", line 40, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR
Hi!!!
Did you figure out how to solve this? I’m having a similar issue.
The first time (or couple of first times) I run a script it works fine, but afterwards it will throw either the same error as yours (CUDNN_STATUS_INTERNAL_ERROR) or a Cuda runtime error (4): unspecified launch failure or segmentation fault (core dumped).
I supose there’s something wrong with my cuda installation but I’ve tried reinstalling it several times, I’ve even tried switching back to kernel 10.3 instead of 13.1 and reinstalling it but so far I cannot get rid of this weird behavior.
I’ve also tried building pytorch from source and switching to pytorch 0.1.12.
I supose you used different versions than mine, but any pointers would be appreciated. I’m running out of ideas.
Btw I’m using Nvidia’s driver 384.69 and Cuda 8.0.61, both installed with the runfiles for Ubuntu 16.04.3 with kernel 13.1 on a system with a single Gtx 1080 and a ryzen 1600 processor.
How did you know it didn’t free the memory? Only using nvidia-smi? Or is there something else? Because cheking with nvidia-smi the memory seems to be free after the script terminates.
In my case it was a hardware related issue.
My pc had a ryzen processor affected by a production bug. I haven’t experienced these problems after replacing the cpu.