Weird device-side assert triggered

Hello/.
I’d like to interpolate bool tensor(mask) to fit various target tensor, but I got:
“upsample_nearest2d” not implemented for ‘Bool’.

So this process:
mask(BoolTensor) -> 1.0 and 0.0 only existing tensor(FloatTensor) :boom: -> interpolate(size changed) -> again BoolTensor by doing .to(torch.bool)

is best way to interpolating bool tensor?

Surely I can do above process in manually in CPU mode, but I got stuck on multi-GPU environment.

I implemented above :boom: process like:

assert mask.dtype is torch.bool

# define broadcast-able helper tensors
zeros = torch.zeros(size=mask.shape, dtype=torch.float, device=mask.device)
ones = torch.ones(size=mask.shape, dtype=torch.float, device=mask.device)

mask_in_float = zeros.masked_scatter(mask, ones)

and I got unexpected error:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCReduceAll.cuh:327

Even this error occurs in the middle of training, not the first time of this code snippet works. I don’t know why this error happens…

So, my question is…

  1. Is there any more good process or function that interpolating bool tensor?
  2. If my idea is best, how do I avoid error(:boom:) in multi-GPU environment?

Any suggestion will be appreciated.

Add os.environ['CUDA_LAUNCH_BLOCKING'] = "1" at the very begining of your training script, then you will get the exact line of code that occurs the error.

Most of the times, device asserts are triggered when something is up with your indexing. @Naruto-Sasuke -san’s hint about launch blocking is gold to see where exactly.
But I didn’t quite understand why if casting the mask to float before interpolation(interpolate(mask.to(torch.float), ...).to(torch.bool) should work on GPU and CPU) would work for you.

If you have an integral scaling factor, you could also use reshape+expand to scale up or indexing with step>1 to scale down.

Best regards

Thomas

Thanks for replies.

Before I reply, I notice that I uses not only pure pytorch, ignite(https://pytorch.org/ignite/) and brevitas(https://github.com/Xilinx/brevitas) to accomplish my research.

Also, I’m watching my GPUs working(utilization) by linux command ‘watch’ for every 0.5 seconds.

@Naruto-Sasuke
when I inserted os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1” in front of my script and ran, I observed that my GPUs totally freezed at few seconds after running.
But I can get ‘exact line of code’ (maybe the same scale of your ‘exact’ meaning) without os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1”, like:

Engine run is terminating due to exception: Caught RuntimeError in replica 0 on device 3.
Original Traceback (most recent call last):
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-8-4952e4c16b8e>", line 207, in forward
    return self._forward_impl(x)
  File "<ipython-input-8-4952e4c16b8e>", line 190, in _forward_impl
    x = self.relu(x)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/nn/quant_layer.py", line 164, in forward
    out = self.act_quant(quant_input)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/proxy/runtime_quant.py", line 123, in forward
    y = self.fused_activation_quant_proxy(y)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/proxy/runtime_quant.py", line 79, in forward
    x, output_scale, output_bit_width = self.tensor_quant(x)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 672, in forward
    y = self.int_quant(scale, int_scale, msb_clamp_bit_width, x)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 440, in forward
    y_int = self.to_int(scale, int_scale, msb_clamp_bit_width, x)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 418, in to_int
    y = self.tensor_clamp_impl(y, min_val=min_int_val, max_val=max_int_val)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/function_wrapper.py", line 135, in forward
    return tensor_clamp(x, min_val=min_val, max_val=max_val)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/function/ops.py", line 65, in tensor_clamp
    out = torch.where(out < min_val, min_val, out)
  File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/tensor.py", line 28, in wrapped
    return f(*args, **kwargs)
RuntimeError: CUDA error: device-side assert triggered

The more weird thing is, when I disable deterministic trainer, I got different exact line of code that occurs the error! This seems really weird, but I assume that one of my specific code snippet is not the problem from this situation…

@tom
I found that your suggestion works well:

# mask interpolation

mask = torch.BoolTensor(size=(8, 3, 24, 24))

mask_in_float = mask.to(torch.float32)

mask_interpolated_in_float = torch.nn.functional.interpolate(mask_in_float, size=(32, 32))

mask_interpolated = mask_interpolated_in_float.to(torch.bool)

print(mask[0][0][0])
print(mask_interpolated[0][0][0])
tensor([ True,  True,  True,  True,  True,  True, False, False, False, False,
        False, False, False, False, False, False,  True,  True,  True,  True,
         True,  True, False, False])
tensor([ True,  True,  True,  True,  True,  True,  True,  True, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False,  True,  True,  True,  True,  True,  True,  True,  True,
        False, False])

so now I’m applying your suggestion to my project. I’ll answer if this idea works…