Thanks for replies.
Before I reply, I notice that I uses not only pure pytorch, ignite(https://pytorch.org/ignite/) and brevitas(https://github.com/Xilinx/brevitas) to accomplish my research.
Also, I’m watching my GPUs working(utilization) by linux command ‘watch’ for every 0.5 seconds.
@Naruto-Sasuke
when I inserted os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1” in front of my script and ran, I observed that my GPUs totally freezed at few seconds after running.
But I can get ‘exact line of code’ (maybe the same scale of your ‘exact’ meaning) without os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1”, like:
Engine run is terminating due to exception: Caught RuntimeError in replica 0 on device 3.
Original Traceback (most recent call last):
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "<ipython-input-8-4952e4c16b8e>", line 207, in forward
return self._forward_impl(x)
File "<ipython-input-8-4952e4c16b8e>", line 190, in _forward_impl
x = self.relu(x)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/nn/quant_layer.py", line 164, in forward
out = self.act_quant(quant_input)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/proxy/runtime_quant.py", line 123, in forward
y = self.fused_activation_quant_proxy(y)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/proxy/runtime_quant.py", line 79, in forward
x, output_scale, output_bit_width = self.tensor_quant(x)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 672, in forward
y = self.int_quant(scale, int_scale, msb_clamp_bit_width, x)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 440, in forward
y_int = self.to_int(scale, int_scale, msb_clamp_bit_width, x)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/quant.py", line 418, in to_int
y = self.tensor_clamp_impl(y, min_val=min_int_val, max_val=max_int_val)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/core/function_wrapper.py", line 135, in forward
return tensor_clamp(x, min_val=min_val, max_val=max_val)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/brevitas/function/ops.py", line 65, in tensor_clamp
out = torch.where(out < min_val, min_val, out)
File "/home/ootzk/anaconda3/envs/DeepLabV3plus/lib/python3.6/site-packages/torch/tensor.py", line 28, in wrapped
return f(*args, **kwargs)
RuntimeError: CUDA error: device-side assert triggered
The more weird thing is, when I disable deterministic trainer, I got different exact line of code that occurs the error! This seems really weird, but I assume that one of my specific code snippet is not the problem from this situation…
@tom
I found that your suggestion works well:
# mask interpolation
mask = torch.BoolTensor(size=(8, 3, 24, 24))
mask_in_float = mask.to(torch.float32)
mask_interpolated_in_float = torch.nn.functional.interpolate(mask_in_float, size=(32, 32))
mask_interpolated = mask_interpolated_in_float.to(torch.bool)
print(mask[0][0][0])
print(mask_interpolated[0][0][0])
tensor([ True, True, True, True, True, True, False, False, False, False,
False, False, False, False, False, False, True, True, True, True,
True, True, False, False])
tensor([ True, True, True, True, True, True, True, True, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, True, True, True, True, True, True, True, True,
False, False])
so now I’m applying your suggestion to my project. I’ll answer if this idea works…