Hi,

I am trying to set up a UNet3+ network and I have been using this paper as my baseline. One thing that confuses me is how to aggregate the loss values from the hidden layer supervision with the output later. All the literature uses some weighted average but it is not clear to me how the weights are calculated.

To add to my confusion I found the following paper. Which also has a git hub and this is what they use for the total loss

```
# loss function: seven probability map --- 6 scale + 1 fuse
class Loss(nn.Module):
def __init__(self, weight=[1.0] * 7):
super(Loss, self).__init__()
self.weight = weight
def forward(self, x_list, label):
loss = self.weight[0] * F.binary_cross_entropy(x_list[0], label)
for i, x in enumerate(x_list[1:]):
loss += self.weight[i + 1] * F.binary_cross_entropy(x, label)
return loss
```

This makes no sense to me, especially since I plan to use Dice loss where I would get values of .9 .88 etcâ€¦ My DiceLoss needs to stay within [0,1] but with equal weights that probably will not happen.

Can anyone chime in how to do a overall loss value when you are using deep supervision?

Thanks