Output of VGG16 on a classification problem

Hello everyone! I wanted to clarify a doubt I have regarding the vgg16 network. I am currently using the pre-trained vgg16 network for a classification problem with 2 labels. I already have the best weights for tthis problem, using as a criterion the nn.CrossEntropyLoss and I can get a prediction by doing:

outputs = vgg16(net_img)
_, preds = torch.max(outputs.data, 1)

However, my goal is not to have a binary prediction (0 or 1), but the probability and also the cross entropy metric for each class. I wanted to check if what I am doing makes sense.
To get the probability of each class I am doing:
probabilities = torch.sigmoid(outputs)
And to get the cross entropy os each class I am simply using the outputs I already calculated.

Does this make sense? If it helps, I based my work in this tutorial: https://www.kaggle.com/carloalbertobarbano/vgg16-transfer-learning-pytorch

Thank you in advance!

To get the probabilities, you should probably use probs = F.softmax(outputs, dim=1), since you are using nn.CrossEntropyLoss as the criterion which means your output should have the shape [batch_size, 2] (used in a multi-class classification).

Thank you for your help! But without the F.softmax layer, the direct output is the cross entropy value, right? since I am using the CrossEntropyLoss as criterion

Without the softmax the output would contain logits. I don’t know, is that’s what “cross entropy value” would refer to.
Could you explain a bit, what you mean by this?
It will not contain the cross entropy (loss) between the output and target, as this will be returned by nn.CrossEntropyLoss itself.

Sorry, I didn’t explain well. Basically I wanted to evaluate my model with something similar to this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

If I understand the docs correctly, probabilities are expected as the model output, so it seems that applying a softmax should work.

However, this part of the docs:

the probabilities provided are assumed to be that of the positive class

sounds as if the loss function is dealing with a multi-label output (each output would give a probability in [0, 1] for a separate class)? Can you verify it?