Got "no graph nodes that require computing gradients" when use torch.max?

My suggestion -
First, you encode your target as a One hot encoding - where the target will be 1 for the correct class index, rest zeroes. Thus, the target will be of dimension (B x C x H x W) where B - batch, C - Number of classes, H, W - Height and width.

Then you can apply the softmax function to your outputs, which is also of dimension (B x C x H x W).

Remember to use the negative of this score while minimising. But I am not sure about convergence of this

# Replace outputs.max with this
outputs = outputs.permute(0,2,3,1).contiguous()
outputs = outputs.view(output.numel() // C, C) # ( B x H x W, C)
outputs = torch.nn.Softmax(outputs) #( B x H x W, C), 
# Probabilities over C classes for each Pixel

targets = targets.permute(0,2,3,1).contiguous()
targets = targets.view(targets.numel() // C, C) # ( B x H x W, C)

# Remaining part of code will be same as yours
# Your code considers each class with equal weight. So if you have too much background,
# The loss due to foreground might be overshadowed. Consider weighting the dice_loss
# component of each class separately by taking the sum without flattening and later summing over 
# each class with a weight attached to it.

You should also take a look at NLL_Loss2D for Segmentation Problems - http://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss2d

References