RuntimeError: CUDA error: device-side assert triggered CUDA

I am trying to run my project on Google Colab’s L4 GPU. However, I get the error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Its weird because everytime I run the project, this error seems to pop up in different places.

I also checked RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. I also use nn.CrossEntropyLoss(), but the error occurs before that line.

I’m not sure what is the problem. Could someone help me?

Hi! Do you use pin_memory or/and non_blocking data transfers?

Yeah, I used pin_memory.

I actually figured it out, turns out there was a place in my code where there was index out of bounds error. I fixed it and the error no longer appeared.

Great! Thanks for the follow-up as device assert issues are usually caused by invalid indexing.