I am trying to run my model on gpu, but I am getting RuntimeError: CUDA error: device-side assert triggered
My model runs fine on CPU.
I have set os.environ['CUDA_LAUNCH_BLOCKING'] = "1" and it is showing error when I am loading the model on cuda on the following line -
Did you make sure to change the env variable before loading pytorch (or at least any cuda related stuff)?
Is anything else than the assert getting printed?
Changing os.environ is actually changing an environment variable .
A safer way to do it is to launch you program with CUDA_LAUNCH_BLOCKING=1 python your_script.py.
Is there anything printed before the traceback that is done your own printing?
The thing is that this environment variable is only affecting the initialization of the cuda driver. This means that if you set it in your notebook after the driver has been initialized, it won’t have any effect.
It either needs to be set when you launch the notebook itself. Or the first thing a clean python kernel runs before calling any cuda related stuff.
You can run your command as: CUDA_LAUNCH_BLOCKING=1 python your_script.py. This will define the environment variable only for your command. You can also enable it in a given shell by running export CUDA_LAUNCH_BLOCKING=1 and all subsequent commands in this shell will have it.