Cuda_Launch_Blocking=1 reduces the training speed?

Yangmin · September 7, 2022, 4:49am

Hi, I was trying to debug about the

"RuntimeError: cuda runtime error (710) : device-side assert triggered "

by using Cuda_Launch_Blocking=1.

It seems like setting the above environment variable to 1 slows down the training speed of whole code.

Should I use this blocking=1 only in debugging and not in training?
Does it reduce the speed?

ptrblck · September 7, 2022, 4:53am

Yes, CUDA_LAUNCH_BLOCKING=1 is a debug env variable used to block kernel launches and to report the proper stacktrace once an assert is triggered. You should not use it in production, but only during debugging.

Amey_Naik · January 20, 2024, 5:27am

What does CUDA_LAUNCH_BLOCKING=1 do exactly? Can someone please elaborate?

ptrblck · January 20, 2024, 1:24pm

Kernel launches will be blocked, i.e. they won’t be executed asynchronously anymore as already explained. From CUDA’s programming guide:

Programmers can globally disable asynchronicity of kernel launches for all CUDA applications running on a system by setting the CUDA_LAUNCH_BLOCKING environment variable to 1. This feature is provided for debugging purposes only and should not be used as a way to make production software run reliably.