Hi,

I want to implement a dice loss for multi-class segmentation, my solution requires to encode the target tensor with one-hot encoding because I am working on a multi label problem. If you have a better solution than this, please feel free to share it.

This loss function needs to be differentiable in order to do backprop. I am not sure how to encode the target while keeping autograd working. I am currently having this error :

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`

code based on @rogetrullo work. https://github.com/pytorch/pytorch/issues/1249

```
def dice_loss(output, target):
"""
input is a torch variable of size BatchxnclassesxHxW representing log probabilities for each class
target is a 1-hot representation of the groundtruth, shoud have same size as the input
"""
encoded_target = Variable(output.data.clone())
encoded_target[...] = 0
encoded_target.scatter_(1,
target.view(target.size(0), 1,
target.size(1), target.size(2)),
1)
assert output.size() == encoded_target.size(), "Input sizes must be equal."
assert output.dim() == 4, "Input must be a 4D Tensor."
num = output * encoded_target # b,c,h,w--p*g
num = torch.sum(num, dim=3) # b,c,h
num = torch.sum(num, dim=2)
den1 = output * output # p^2
den1 = torch.sum(den1, dim=3) # b,c,h
den1 = torch.sum(den1, dim=2)
den2 = encoded_target * encoded_target # g^2
den2 = torch.sum(den2, dim=3) # b,c,h
den2 = torch.sum(den2, dim=2) # b,c
dice = (2 * num / (den1 + den2))
dice_total = -1 * torch.sum(dice) / dice.size(0)
return dice_total
```

If you think of a solution that does not requires one-hot encoding to evaluate the dice similarity of a multi-class problem, I am also interested !

Thanks