Unable to move model to cuda

Rishav_Sapahia · September 19, 2019, 6:47pm

I am trying to run my model on gpu, but I am getting
RuntimeError: CUDA error: device-side assert triggered

My model runs fine on CPU.
I have set os.environ['CUDA_LAUNCH_BLOCKING'] = "1" and it is showing error when I am loading the model on cuda on the following line -

if train_on_gpu:
    unet = unet.to('cuda')

albanD · September 19, 2019, 7:13pm

Hi,

Did you make sure to change the env variable before loading pytorch (or at least any cuda related stuff)?
Is anything else than the assert getting printed?

Rishav_Sapahia · September 19, 2019, 7:38pm

Hi,
No, I didn’t change any env variables, could you please tell how to do it?

The whole error traceback is getting printed, I am posting it here-

RuntimeError                              Traceback (most recent call last)
<ipython-input-80-b3e46229c2c3> in <module>
----> 1 unet = unet.to('cuda')
      2 print(unet.device)

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in to(self, *args, **kwargs)
    430             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    431 
--> 432         return self._apply(convert)
    433 
    434     def register_backward_hook(self, hook):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    206     def _apply(self, fn):
    207         for module in self.children():
--> 208             module._apply(fn)
    209 
    210         def compute_should_use_set_data(tensor, tensor_applied):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    206     def _apply(self, fn):
    207         for module in self.children():
--> 208             module._apply(fn)
    209 
    210         def compute_should_use_set_data(tensor, tensor_applied):

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    228                 # `with torch.no_grad():`
    229                 with torch.no_grad():
--> 230                     param_applied = fn(param)
    231                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    232                 if should_use_set_data:

~\.conda\envs\opt\lib\site-packages\torch\nn\modules\module.py in convert(t)
    428 
    429         def convert(t):
--> 430             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    431 
    432         return self._apply(convert)

RuntimeError: CUDA error: device-side assert triggered

albanD · September 19, 2019, 7:48pm

Changing os.environ is actually changing an environment variable .
A safer way to do it is to launch you program with CUDA_LAUNCH_BLOCKING=1 python your_script.py.

Is there anything printed before the traceback that is done your own printing?

Rishav_Sapahia · September 19, 2019, 7:55pm

I have included in os.environ['CUDA_LAUNCH_BLOCKING'] = "1" in my jupyter notebook.
No, I am not printing before the traceback

albanD · September 19, 2019, 7:57pm

The thing is that this environment variable is only affecting the initialization of the cuda driver. This means that if you set it in your notebook after the driver has been initialized, it won’t have any effect.

It either needs to be set when you launch the notebook itself. Or the first thing a clean python kernel runs before calling any cuda related stuff.

Rishav_Sapahia · September 19, 2019, 8:07pm

Thanks a lot. It worked. If I am running a script then how to do it?

albanD · September 19, 2019, 8:25pm

You can run your command as: CUDA_LAUNCH_BLOCKING=1 python your_script.py. This will define the environment variable only for your command. You can also enable it in a given shell by running export CUDA_LAUNCH_BLOCKING=1 and all subsequent commands in this shell will have it.

JuyiLin · April 1, 2023, 8:01pm

I also meet this problem. How did you solve it?