krishneel
(Krishneel)
1
The following code snippet from the documentation except I changed the target to be out of bound.
loss = nn.CrossEntropyLoss().to(device)
input = torch.randn(3, 2, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(15)
output = loss(input, target)
On device = cpu
it raises the correct error
# example
IndexError: Target 2 is out of bounds.
However, the same code snippet (with target out of bound) does not raise the same error on device = cuda:0
but returns a loss = 0
I couldn’t figure out why the behavior is as such or did I missed out something.
PyTorch version = 1.5.0
Triggers the expected assertion error for me with torch 1.3.
Technically, you are changing the target to be random, so you have 0.2% chance of generating a valid target…
Technically, you are changing the target to be random, so you have 0.2% chance of generating a valid target…
I did mention in the question that I checked that the target was out of bound.
It triggers this error on my system:
device = 'cuda'
loss = nn.CrossEntropyLoss().to(device)
input = torch.randn(3, 2, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(15)
output = loss(input, target)
> IndexError: Target 13 is out of bounds.
@ptrblck actually I forgot to mention I was using the PyTorch Docker Image when I get the behavior mentioned in the question.
I just tested on a host machine and rightly so it does raise an error.
Seems the error happens on Docker image.
130
6
Hi,
I got the same unexpected behaviour with PyTorch 1.5.0 using a GPU
loss = torch.nn.CrossEntropyLoss()
weights = torch.randn(10, 5)
labels = torch.arange(10)
loss(weights.cuda(), labels.cuda())
Out[13]: tensor(1.7329, device='cuda:0')
The error is triggered if I use the CPU though.
Could you update to the nightly binaries and recheck it, please?
Recently an issue was fixed, which silenced the assert
statements in the CUDA code.
130
8
Hi,
Sorry I took some time to answer !!
I got this error now, which seems to be appropriate
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/leo/Venv/test/lib/python3.7/site-packages/torch/tensor.py", line 154, in __repr__
return torch._tensor_str._str(self)
File "/home/leo/Venv/test/lib/python3.7/site-packages/torch/_tensor_str.py", line 333, in _str
tensor_str = _tensor_str(self, indent)
File "/home/leo/Venv/test/lib/python3.7/site-packages/torch/_tensor_str.py", line 229, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/leo/Venv/test/lib/python3.7/site-packages/torch/_tensor_str.py", line 101, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered
Thanks!