Why I am consistently getting an Intersection (IoU) score of 0.0 during training?

I’m doing semantic segmentation of 3D tensors. I’m using a standard 3D UNet and DiceLoss()/BCELoss(). I’m using a standard IoU (Intersection over Union) measure to calculate the accuracy/error of the network.

def dice_error(input, target):
    _, result_ = input.max(1)
    result_ = torch.squeeze(result_)
    if input.is_cuda:
        result = torch.cuda.FloatTensor(result_.size())
        target_ = torch.cuda.FloatTensor(target.size())
    else:
        result = torch.FloatTensor(result_.size())
        target_ = torch.FloatTensor(target.size())
    result.copy_(result_.data)
    target_.copy_(target.data)
    target = torch.squeeze(target_)
    intersect = torch.dot(result.view(-1), target.view(-1))

    result_sum = torch.sum(result)
    target_sum = torch.sum(target)
    union = result_sum + target_sum + 2*eps
    IoU = intersect / union
    return 2*IoU

This function continues to return a value of 0, when given the network output and target tensor, both of the shape [1,1,32,128,128]. The target tensor has 0 for background and 1 for points of interest. So the function should find somke interset eventually. But even after many thousand iterations the error returned is 0, although the loss decreases.

Is there any reason I am not getting any actual value and although the loss is decreasing, there is still no intersection between the network output and target??

Thanks

Here is another type of error calculation and this seems to work fine:

def error(preds, targets):
    assert preds.size() == targets.size()
    bs, d, h, w = preds.size()
    n_pixels = bs*d*h*w
    
    incorrect = preds.ne(targets).cpu().sum()       
    # Divide the number of incorrectly classified pixel by the number of all pixels.
    err = incorrect.numpy()/n_pixels           
    # return err
    return round(err, 5)

This is returning an error value that seems to be working.

Isnt this doing the same thing as the dot operation in the IoU function?

Hi,

The problem you met, I have also met a few days ago in my Accuracy function. In my opinion, the outputs of the model(we call it logits) and the targets are both Tensors, so there are two ways to calculate metrics, in numpy or in tensor. I have tried convert tensors to numpy array, it is okay but it cost so much time to calculate metrics at the end of each iteration. So I tired to calculate them in Tensor, you can try to print you union and intersection they are not 0, but IoU is zero may be caused by type error. In my code, I try to calculate sum of valid pixels and correct pixels and convert them to .long() and IoU calculated by .long() / .long() , then get value 0. Otherwise, you can also use .item() to get python scalar correspond to tensor.

Hope this helps.

1 Like

No, PyTorch does use operator overloading and / should work correctly (equivalent) to div. E.g.,:

In [2]: a = torch.tensor([1., 2., 3.])                                          

In [3]: b = torch.tensor([2., 3., 4.])                                          

In [4]: a / b                                                                   
Out[4]: tensor([0.5000, 0.6667, 0.7500])

When I have unexpected zeros, I usually check first to make sure that everything has the correct type to avoid problems like

In [6]: a.long() / b.long()                                                     
Out[6]: tensor([0, 0, 0])

I am not sure if this is happening in the OP’s code though. I wonder what the lines

    result.copy_(result_.data)
    target_.copy_(target.data)

aim to accomplish though.

Maybe try to clean up the following

    if input.is_cuda:
        result = torch.cuda.FloatTensor(result_.size())
        target_ = torch.cuda.FloatTensor(target.size())
    else:
        result = torch.FloatTensor(result_.size())
        target_ = torch.FloatTensor(target.size())
    result.copy_(result_.data)
    target_.copy_(target.data)

by replacing it by a more straightforward implementation:

result = result.float().to(input.device)
target = target.float().to(input.device)
1 Like

Oh, thank you. The problem in my code may be the type error similar in your examples and not the division forms, I will re-edit my answer.

Thanks a lot :smiley: