Thanks for the explanation.

Do you mean I need to threshold the prediction (i.e the probabilities after the activation function (sigmod/softmax)) so as to be one-hot encoded like the target?

Sample code:

```
def dice_single_channel(probability, truth, activation, eps = 1e-7, squared_pred = False):
if activation:
probability = nn.Softmax2d()(probability)
p = (probability >= .5).float()
t = (truth).float()
intersection = (p * t).sum()
if squared_pred:
p = p**2
t = t**2
return 1-(2.0 * intersection + eps)/ (p.sum() + t.sum() + eps)
batchsize = 2
channel_num = 5
probability = torch.rand(batchsize,channel_num,10,10)
truth = torch.randint(low = 0, high=2, size=(batchsize,channel_num,10,10))
print(probability.sum(), truth.sum())
dice = dice_single_channel(probability, truth, activation=True)
print(dice)
```

Outputs:

```
tensor(0.9302)
```