Cuda runtime error (999)

Is there any way, I could fix this without crashing in first place? I actually run some PyTorch scripts, and then suspend the laptop if I have to move to another place, but unfortunately, the process crashes due to the same error and I have to re-run the entire script again.

This worked! Thank you so much

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

For me, just the two steps above does not work, I have to do more, like the detail below:


/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i

  mknod -m 666 /dev/nvidiactl c 195 255

  exit 1

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
  exit 1