Unable to move model to cuda

I am trying to run my model on gpu, but I am getting
RuntimeError: CUDA error: device-side assert triggered

My model runs fine on CPU.
I have set os.environ['CUDA_LAUNCH_BLOCKING'] = "1" and it is showing error when I am loading the model on cuda on the following line -

if train_on_gpu:
    unet = unet.to('cuda')

Hi,

Did you make sure to change the env variable before loading pytorch (or at least any cuda related stuff)?
Is anything else than the assert getting printed?

Hi,
No, I didn’t change any env variables, could you please tell how to do it?

The whole error traceback is getting printed, I am posting it here-

RuntimeError                              Traceback (most recent call last)
<ipython-input-80-b3e46229c2c3> in <module>
----> 1 unet = unet.to('cuda')
      2 print(unet.device)

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in to(self, *args, **kwargs)
    430             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    431 
--> 432         return self._apply(convert)
    433 
    434     def register_backward_hook(self, hook):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    206     def _apply(self, fn):
    207         for module in self.children():
--> 208             module._apply(fn)
    209 
    210         def compute_should_use_set_data(tensor, tensor_applied):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    206     def _apply(self, fn):
    207         for module in self.children():
--> 208             module._apply(fn)
    209 
    210         def compute_should_use_set_data(tensor, tensor_applied):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    228                 # `with torch.no_grad():`
    229                 with torch.no_grad():
--> 230                     param_applied = fn(param)
    231                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    232                 if should_use_set_data:

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in convert(t)
    428 
    429         def convert(t):
--> 430             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    431 
    432         return self._apply(convert)

RuntimeError: CUDA error: device-side assert triggered

Changing os.environ is actually changing an environment variable .
A safer way to do it is to launch you program with CUDA_LAUNCH_BLOCKING=1 python your_script.py.

Is there anything printed before the traceback that is done your own printing?

I have included in os.environ['CUDA_LAUNCH_BLOCKING'] = "1" in my jupyter notebook.
No, I am not printing before the traceback

The thing is that this environment variable is only affecting the initialization of the cuda driver. This means that if you set it in your notebook after the driver has been initialized, it won’t have any effect.

It either needs to be set when you launch the notebook itself. Or the first thing a clean python kernel runs before calling any cuda related stuff.

Thanks a lot. It worked. If I am running a script then how to do it?

You can run your command as: CUDA_LAUNCH_BLOCKING=1 python your_script.py. This will define the environment variable only for your command. You can also enable it in a given shell by running export CUDA_LAUNCH_BLOCKING=1 and all subsequent commands in this shell will have it.

1 Like