Zombie process after CUDA assert

When using pytorch on CUDA, sometimes after a CUDA assert (for example, size mismatch), the python process is stuck on kernel code (full red bar on htop) and can’t be killed (also with pkill -9, and all other signals). sudo also can’t kill the process and only restarting the compute node killed it.
This happened with multiple pytorch versions (1.13 and 2) compiled with CUDA 11.7 and 6 from conda and pip. The OS is CentOS 7 and GPUs are 1080 TI. It should be noted that the same code on a different compute cluster, with 2080 Ti and A100 GPUs does not get stuck (there is a CUDA assert but the process is done afterwards).
I’m not sure if this is the right place to report it so please let me know if there is a better place.