Cuda_Launch_Blocking=1 reduces the training speed?

Hi, I was trying to debug about the

"RuntimeError: cuda runtime error (710) : device-side assert triggered "

by using Cuda_Launch_Blocking=1.

It seems like setting the above environment variable to 1 slows down the training speed of whole code.

Should I use this blocking=1 only in debugging and not in training?
Does it reduce the speed?

1 Like

Yes, CUDA_LAUNCH_BLOCKING=1 is a debug env variable used to block kernel launches and to report the proper stacktrace once an assert is triggered. You should not use it in production, but only during debugging.

2 Likes

What does CUDA_LAUNCH_BLOCKING=1 do exactly? Can someone please elaborate?

Kernel launches will be blocked, i.e. they won’t be executed asynchronously anymore as already explained. From CUDA’s programming guide:

Programmers can globally disable asynchronicity of kernel launches for all CUDA applications running on a system by setting the CUDA_LAUNCH_BLOCKING environment variable to 1. This feature is provided for debugging purposes only and should not be used as a way to make production software run reliably.