[SOLVED] Multi-label dice loss

elistevens · October 3, 2018, 10:35pm

I am trying to implement multi-label dice loss, but am getting the following exception:

  File ".../training.py", line 167, in main
    diceLoss_devtensor.backward()
  File ".../.venv/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File .../.venv/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Here’s my code:

    prediction_devtensor = self.model(input_devtensor)

    sum2 = lambda t: t.view(t.size(0), t.size(1), -1).sum(dim=2)

    diceCorrect_devtensor = sum2(prediction_devtensor * label_devtensor)
    dicePrediction_devtensor = sum2(prediction_devtensor)
    diceLabel_devtensor = sum2(label_devtensor)
    epsilon_devtensor = torch.ones_like(diceCorrect_devtensor) * 0.01

    diceLoss_devtensor = 1 - (2 * diceCorrect_devtensor + epsilon_devtensor) / (dicePrediction_devtensor + diceLabel_devtensor + epsilon_devtensor)

    # this gets .backward() called on it in the stack trace above
    return diceLoss_devtensor.mean()

I made sure my Dataset called .contiguous() on the tensors it returned, but that didn’t change anything. The model is a fairly simple implementation of UNet for 3D data.

Any suggestions for how I can troubleshoot? I’ve sprinkled some assert t.is_contiguous() around the model, etc. but that hasn’t caught anything (though I wasn’t comprehensive).

elistevens · October 3, 2018, 11:36pm

Looks like I might have run into https://github.com/pytorch/pytorch/issues/4107 since reducing the size of the array makes the problem seem to go away.