Trying to understand a Pytorch function

                      Hello dear programmers,

I am still a beginner in Pytorch. I have found the following function from a repository. I thought that it is aimed to estimate some evaluation metrics such as recall, precision and F-score. However, when I used it in my training model, it gave me some weird values such as a recall of 6.5, precision of 0.08 and F-score of 0.085

def val(model,train_loader):

model.train(False)

overlap,Y_area,X_area,recall,precision = [],[],[],[],[]

for i, data in enumerate(train_loader):

    x, y_true = data

    if y_true.max()>0:

        

        if torch.cuda.is_available():

            x, y_true = x.cuda(), y_true.cuda()

        output = model(x)

        if output.max() > 0.5:

            overlap.append(torch.sum(torch.round(output).mul(y_true)).item())

            Y_area.append(torch.sum(y_true).item())

            X_area.append(torch.sum(torch.round(output)).item())

            recall.append(overlap[-1]/Y_area[-1])

            precision.append(overlap[-1]/X_area[-1])

print(np.mean(overlap),np.mean(Y_area),np.mean(X_area),np.mean(recall),np.mean(precision))

model.train(True)

return np.mean(recall)*np.mean(precision)/(np.mean(recall)+np.mean(precision)+0.00001)

Please, could anyone help with an explanation of this function? I would be very grateful if you could also tell me how to adjust it properly.

Thank you very much for your time and guidance

Based on the posted code it looks like the model is expected to output and a probability for a binary classification, i.e. a sigmoid is used as the last non-linearity.
Is this also the case for your model?

Also, could you check the shapes of your output as well as the target tensor y_true to make sure that no unwanted broadcasting is happening?

Hello sir

output.shape =(1, 2,64, 64) and y_true.shape = (1, 64, 64)

Yes sir, I am trying to perform a binary segmentation task.

I have been trying to fix the problem and as for now, it sometimes gives some negative values. Is this result normal? Could you help me check whether the content of the code is logical?
Thank you very much

I’m still unsure, if your model is returning logits or log probabilities or something else.
That being said, I think this line of code is not working as expected:

# Setup using your provided shapes
output = torch.randn(1, 2, 64, 64)
target = torch.randint(0, 2, (1, 64, 64))

# Overlap calculation
tmp = torch.round(output).mul(target)
print(tmp.shape)
> torch.Size([1, 2, 64, 64])

I guess you would tile to get the predicted class from output, which would work using torch.argmax(output, 1) instead of torch.round.
However, I might be wrong depending on the actual type of output you are dealing with.

Hello sir, I ma very grateful for your time and reply. A sample of the output is as follows:

tensor([[[[-1.9360e-03,  1.6174e-03,  7.0995e-03,  ...,  6.7276e-04,
           -1.2154e-03, -1.4051e-03],
          [-4.9215e-03,  5.7097e-03, -6.7503e-03,  ...,  1.4770e-03,
           -4.8499e-04,  1.8494e-03],
          [-4.8179e-03, -4.1774e-03,  2.4349e-03,  ...,  9.2152e-04,
            1.0596e-03,  7.4716e-04],
          ...,
          [ 1.9952e-03, -6.0699e-04, -2.3684e-03,  ..., -1.3495e-04,
            9.3090e-06, -3.7983e-05],
          [-2.1160e-03, -5.0737e-04,  1.9036e-03,  ..., -1.2619e-04,
            1.1687e-04,  7.8221e-05],
          [ 7.3612e-04,  4.1756e-05, -2.4000e-03,  ..., -5.8748e-05,
           -8.1708e-05,  2.6121e-05]],

         [[ 7.7094e-03,  1.0867e-03,  5.3360e-03,  ...,  1.6651e-03,
           -1.8157e-03, -6.9385e-04],
          [ 6.7968e-03, -2.2930e-04, -4.8593e-03,  ..., -2.5276e-04,
            1.2758e-04, -1.0331e-03],
          [ 4.3360e-03, -2.8905e-03, -1.5076e-03,  ...,  1.3465e-03,
           -8.5237e-04, -2.1074e-03],
          ...,
          [ 6.3477e-04,  8.0362e-04, -1.6985e-03,  ...,  9.2338e-05,
           -1.9669e-05,  2.2638e-05],
          [ 8.7553e-05,  1.1976e-03,  2.3584e-03,  ...,  3.8165e-05,
           -5.9556e-05, -6.1906e-06],
          [-8.0818e-04,  2.3715e-04,  3.0344e-04,  ...,  9.8189e-05,
           -7.9020e-05,  6.2674e-05]]]], grad_fn=<SlowConvTranspose2DBackward>)

As for now the values of the recall and precision are no longer negative or greater than 1. However, I have been facing a new problem. During the training, the values of the recall and precision at most of the epochs is null (nan). Only few of the epochs could yield numerical values and very low values. Is this normal or there is still something wrong in the coding logic. I am yet to optimize the whole network. I would like to make sure that the model evaluation is right before moving on.
Moreover, In some cases, the following error occurs

 precision.append(overlap[-1]/X_area[-1])
ZeroDivisionError: division by zero

Sincerely appreciate your help and suggestions