I am trying to implement multi-label dice loss, but am getting the following exception:
File ".../training.py", line 167, in main
diceLoss_devtensor.backward()
File ".../.venv/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File .../.venv/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Here’s my code:
prediction_devtensor = self.model(input_devtensor)
sum2 = lambda t: t.view(t.size(0), t.size(1), -1).sum(dim=2)
diceCorrect_devtensor = sum2(prediction_devtensor * label_devtensor)
dicePrediction_devtensor = sum2(prediction_devtensor)
diceLabel_devtensor = sum2(label_devtensor)
epsilon_devtensor = torch.ones_like(diceCorrect_devtensor) * 0.01
diceLoss_devtensor = 1 - (2 * diceCorrect_devtensor + epsilon_devtensor) / (dicePrediction_devtensor + diceLabel_devtensor + epsilon_devtensor)
# this gets .backward() called on it in the stack trace above
return diceLoss_devtensor.mean()
I made sure my Dataset called .contiguous()
on the tensors it returned, but that didn’t change anything. The model is a fairly simple implementation of UNet for 3D data.
Any suggestions for how I can troubleshoot? I’ve sprinkled some assert t.is_contiguous()
around the model, etc. but that hasn’t caught anything (though I wasn’t comprehensive).