Cuda runtime error (999)

Hi ptrblck: If you’re thinking that heat may be why I’m having these problems, it’s not. I’m very slowly stepping through PyTorch’s transformer implementation using PyCharm, not running code, at least on my work computer which has had the problem I’ve described. (I worry about heat on the system because I ran a HLTA analysis that took 3 days and the sensors reported momentary temperatures above 90C, where 100C is a critical temperature for my chips according to lmsensors. The run was not on the GPU and encountered no errors.)

There’s also another reason to think the problem I’ve reported is not due to hardware. My home computer, which is entirely different hardware (different manufacturer, chipset, GPU), is encountering the same problem I reported here. So, two different hardware setups, the same software setup, the same error–which suggests a software issue. I’m going to do some experimenting with suspend because it seems that I run into this problem after waking from suspend.