RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCReduceAll.cuh:317 using evaluation from InceptionV3

outputs = model(inputs)
                   
                    #print(outputs)
                    
                    # for nets that have multiple outputs such as inception
                    if isinstance(outputs, tuple):
                        loss = sum((criterion(o,labels) for o in outputs))
                    else:
                        loss = criterion(outputs, labels)

                    #valid_losses.append(loss.item())

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        _, preds = torch.max(outputs[0].data, 1)
                        loss.backward()
                        optimizer.step()
                    else:
                        _, preds = torch.max(outputs.data, 1)

I’m using InceptionV3 from pytorch and everything works great until the Evaluation phase. The training finishes but when I try to run the evaluation phase I get this error on _, preds = torch.max(outputs.data, 1).

Traceback (most recent call last):
  File "inception_test.py", line 212, in <module>
    main(args)
  File "inception_test.py", line 199, in main
    num_epochs=args.epochs)
  File "inception_test.py", line 146, in train_model
    print(outputs.data)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/tensor.py", line 66, in __repr__
    return torch._tensor_str._str(self)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/_tensor_str.py", line 277, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/_tensor_str.py", line 195, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/_tensor_str.py", line 84, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCTensorMathCompare.cuh:82

Any idea what’s wrong?

Often this means some CUDA side indexing problem. Can you run again with CUDA_LAUNCH_BLOCKING=1 and report the error you get then?