I did a small experiment with the ShuffleNetV2 model, but I’m not sure about its results.
From the torchvision models, I loaded aShuffleNetV2 instance pretrained on ImageNet. So I have a classification network across 1000 classes, right?
Then, I replaced the last FC layer:
model.fc = Linear(in_features=model.fc.in_features, out_features=1, bias=(model.fc.bias is not None))
Also I called
reset_parameters() on all layers.
So now, supposedly, I have a random network for binary classification, right? So its output should vary between 0 and 1.
But no. I figured that since the classification network probably has a softmax operator after the FC layer, then for a single output, I will got constantly 1.
Then I tested my hypothesis with random inputs, and as results I get negative values!
Out: tensor([[-0.0318]], grad_fn=<AddmmBackward>)
How can it be?
If you set the number of output using to 1, you should use
nn.BCEWithLogitsLoss as your criterion.
Also, your target should have the same shape (
[batch_size, 1]), have values in
[0, 1], and be a
Alternatively, if you would like to stick to
nn.CrossEntropyLoss, you should specify
out_features=2, and your target should be a
LongTensor, containing the class indices in
[0, 1], and have the shape
Thanks. Regardless of the training criterion, I am discussion about an untrained network (as I reset the parameters).
Does the call to the FC layer is followed by a softmax? (I didn’t find the model’s source code)
If so, how come it results to a negative number?
No, the model returns the logits (line of code).
I see. Thanks!
So I’ll try to add a Sigmoid layer at the end (to get a probability of belonging to class 1, which then I can compare it to my ground truth class labels with values
[0, 1]), and I’ll train using a BCELoss.