(Pdb) out.data.max(1)[1].cpu().numpy().max()
18295877782208576
(Pdb) out.data.max(dim=1)[1].cpu().numpy().max()
22518346029203522
(Pdb) out.data.max(dim=0)[1].cpu().numpy().max()
22518346030252096
(Pdb) out.data.max(dim=2)[1].cpu().numpy().max()
799
(Pdb) out.data.max(dim=3)[1].cpu().numpy().max()
799
(Pdb) out.data.max(dim=1)[1].cpu().numpy().max()
18014746402881610
(Pdb) out.data.max(dim=0)[1].cpu().numpy().max()
22518346030252096
(Pdb) out.size()
torch.Size([1, 35, 800, 800])
I guess while calculating out.data.max(dim=1)[1].cpu().numpy().max()
, the output should be less than 35
. any one could explain this ?
albanD
(Alban D)
July 20, 2018, 9:19am
2
Hi,
Is looks suspicious indeed.
Does out.data.max(1)[1].max()
return the same value? (without sending it to numpy)
What version of pytorch are you using?
How is out
obtained? what does out.is_contiguous()
and out.sparse
return?
Could you provide us with and small code sample to reproduce this?
out.data.max(1)[1].max()
get tensor(1.8296e+16, device='cuda:0')
pytorch version is 0.4.0
the out is the output of deeplabv3+ with GTA5 dataset
out.is_contiguous()
is True
out.sparse
get AttributeError: 'Tensor' object has no attribute 'sparse'
another problem is while the out.data.max(1)[1].max()
less than 35, error occurs with
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCTensorCopy.c line=70 error=59 : device-side assert triggered
*** RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCTensorCopy.c:70
and the error occurs after the loss = CrossEntropyLoss2d(out, targets)
albanD
(Alban D)
July 20, 2018, 10:17am
4
Ok,
I mean out.is_sparse
but I guess it’s not really important here.
If you have you code crashing then you should:
Run it on cpu. Make sure it runs without error and see if it has the same behaviour.
If the cpu version does not crash and return proper values. Then run the cuda version with CUDA_LAUNCH_BLOCKING=1 python your_script.py
. This will make cuda do a bit more error checking, notably stop the execution before returning garbage values. With this enabled, you should not see any garbage value anymore and you should see a proper error message.