I am working on a classification problem and right now I’ve coded the network as a classifier which has n (n = number of classes) nodes in the output layer (and then I use softmax and get the label prediction through maximum index). I’m wondering if I can add one more layer in the end with single output and apply softmax to the previous (n nodes) layer and get the label as a model output. Is this possible or is there any other way to get integer labels as model output?
You could do that but you would have to train the last linear layer to give the correct outputs. Also the range on the linear layers is infinite so it could give a number more than the number of layers that is why people use softmax.
but you would have to train the last linear layer to give the correct outputs.
You mean the integer output layer that I want to add?
the range on the linear layers is infinite
Yeah, that’s the problem, it either goes to infinity or stays 0 if I apply sigmoid activation layer.
This is my current network:
x = self.activation(nn.Linear(16*16, 512)(x)) x = self.activation(nn.Linear(512, 256)(x)) x = self.activation(nn.Linear(256, 128)(x)) x = torch.log_softmax(nn.Linear(128, 65)(x).squeeze(), dim = 1) # x = nn.Linear(65,1)(x) _, max_val = torch.max(x, dim = 1) x = max_val.to(torch.float32) # x = nn.Linear(1,1)(max_idx.to(torch.float32))
The last layer on which I’ve applied softmax, was my original last layer, I’ve 65 classes and it performs really well. But I am trying to convert that to integer output from the model, I tried a few things (commented lines) but none of them works. I just wanted to know if this can be done (and produce accurate results) at all or not.
Yes I mean the integer output layer that you would add you would have to train it like all the other layers. You could freeze the rest of your model and just train that layer and it might work. But you would have to train it to see. One possibility is that you could apply a sigmoid function to that layer which would make it in the range of 0 to 1. Then you could multiply it by the number of labels. That would ensure the range is between 0 and 19 but then you would have to train it using something like an MSE loss to make sure it is outputting the correct classes. Overall it is not worth it softmax is a simpler solution in my opinion.
If by “model output” you mean the model predictions that you feed
in to your loss function, then you don’t want to do this. If you do
manage to have your model output values that are exactly integers
(even if they are encoded as floating-point numbers) those outputs
won’t be usefully differentiable so your model won’t train.
As an aside, you don’t need the
softmax() before the “maximum
argmax()) to get the label prediction.
change the order of its arguments, so you can leave it out.
Alright, I’ll try that, I too think its not worth it but this is required in this particular case. Thanks for your time!
That’s exactly what happened when I tried it, I just wanna try a few things and see if I can work around it.
I am doing that to get the index of the maximum value. Maybe I didn’t understand what you said?
Let’s call the output of your model
logits. What I mean is:
torch.argmax (torch.nn.functional.softmax (logits)) == torch.argmax (logits)
That is, even though the values returned by
softmax() are different
than its inputs, the order of the returned values is the same as the
order of the inputs. So leaving out
softmax() won’t change the
argmax(). (I’m just noting that while using
valid, it’s not necessary.)
Ohhh, I just realized that. Yeah, that makes sense, Thank you for letting me know.