RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCStorage.c:36
the detailed info is
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [886,0,0] Assertion
t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [892,0,0] Assertiont >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [893,0,0] Assertiont >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [894,0,0] Assertiont >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [895,0,0] Assertiont >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [804,0,0] Assertiont >= 0 && t < n_classes
failed.
main()
File “train.py”, line 135, in main
train_FCN(opt)
File “train.py”, line 82, in train_FCN
trainer.train()
File “/media/sjtu/831bebd9-c866-4ece-b878-5dbd68e5ca50/sjtu/CH/seg_transfer/models/trainer.py”, line 240, in train
self.train_epoch()
File “/media/sjtu/831bebd9-c866-4ece-b878-5dbd68e5ca50/sjtu/CH/seg_transfer/models/trainer.py”, line 169, in train_epoch
self.validate()
File “/media/sjtu/831bebd9-c866-4ece-b878-5dbd68e5ca50/sjtu/CH/seg_transfer/models/trainer.py”, line 80, in validate
if np.isnan(float(loss.data.item())):
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCStorage.c:36
actually the error occurs while calculating the loss function CrossEntropyLoss2d
for segmentation task.
I checked the label image, there is no -1
and the num of output channels is correct.
Is there anyone get the point where the problem?