Hi,
I came across the problem
CUDA error: unspecified launch failure
when I tried running the model for a second time in a python script. The model is in the function RunmyModel()
, and I need to call the function two times in the python script. However, when the program call the function at the second time, it showed the CUDA error. Specifically, the code ended at
File ~/integration/testmodel.py:160, in testNet.to(self)
158 for name, param in self.dict.items():
159 if name not in [‘k’]: →
160 self.dict[name] = param.cuda()
RuntimeError: CUDA error: unspecified launch failure
It seems the error occurred due to .to()
when transferring the model to GPU. Aslo , I noticed that the GPU memory was not free after the first running.
I tried torch.cuda.empty_cache()
but it didn’t work. However, when I ran the same script in another machine with the same version of torch and cuda, the error does not occurred even though the GPU memory still was not free after the first running. How to solve the problem?
PS. I checked dmesg
, here is the info:
[ 5.474440] [drm] [nvidia-drm] [GPU ID 0x00000b00] Loading driver
[4201635.872634] NVRM: GPU at PCI:0000:0b:00: GPU-cab56c2d-811d-98fb-d3de-52e2bb36782d