Issue with nn.Softmax() not applying to defined model ouput

Hi! I recently ran into this issue where when using torch.nn.Softmax() under model init, I would find my output classifier without softmax applied.

So in this case, when defining model init:

class TestNet(nn.Module):
  def __init__(self, num_classes=1000 ):
          super(TestNet,self).__init__()
          ...
          ## fully conected layer with output size defined by number of classes
          self.fc = nn.Sequential(
              nn.Linear(256, num_classes),
              nn.Softmax(dim=1)
          )

  def forward(self,x):
       ...
      x = self.fc(x)
      return x

The result of the output would seem to look as if softmax isn’t applied whatsoever:

But if I apply torch.nn.functional.softmax(output), the result returned is as expected, with values between 0-1, and the sum of all values in each batch equating to 1.

PS: Note my Criterion is set to use CrossEntropyLoss, which by default should be using torch.nn.LogSoftmax(). Would then in this case effect the nn.Softmax layer applied to the model in init?

Hi Xinan!

Works for me. Here is a script that tests your TestNet:

import torch
print (torch.__version__)

_ = torch.manual_seed (2024)

import torch.nn as nn

class TestNet(nn.Module):
    def __init__(self, num_classes=1000 ):
        super(TestNet,self).__init__()
        # ...
        ## fully conected layer with output size defined by number of classes
        self.fc = nn.Sequential(
            nn.Linear(256, num_classes),
            nn.Softmax(dim=1)
        )
    
    def forward(self,x):
        # ...
        x = self.fc(x)
        return x

tn = TestNet (num_classes = 4)

print ('tn = ...')
print (tn)

inp = torch.randn (1, 256)

print ('tn (inp) = ...')
print (tn (inp))

print ('tn.fc[0] (inp) = ...')
print (tn.fc[0] (inp))
print ('tn.fc[0] (inp).softmax (dim = 1) = ...')
print (tn.fc[0] (inp).softmax (dim = 1))

And here is its output:

2.3.0
tn = ...
TestNet(
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=4, bias=True)
    (1): Softmax(dim=1)
  )
)
tn (inp) = ...
tensor([[0.4014, 0.2170, 0.1677, 0.2139]], grad_fn=<SoftmaxBackward0>)
tn.fc[0] (inp) = ...
tensor([[ 0.7785,  0.1635, -0.0946,  0.1489]], grad_fn=<AddmmBackward0>)
tn.fc[0] (inp).softmax (dim = 1) = ...
tensor([[0.4014, 0.2170, 0.1677, 0.2139]], grad_fn=<SoftmaxBackward0>)

Yes, CrossEntropyLoss does have LogSoftmax built into it, so you
do not want a Softmax layer in your model. Presumably you just want
self.fc = nn.Linear(256, num_classes).

No. TestNet has no idea what happens after it is called. So it doesn’t know
that its output is being fed into CrossEntropyLoss and the fact that it is won’t
affect TestNet’s Softmax layer.

Best.

K. Frank

1 Like

Hi K.Frank!

Thanks for confirming that CrossEntropyLoss does apply LogSoftmax during loss calculation! (which also make sense that my model is converging properly still).

It’s interesting that in your case, nn.Softmax did apply to your model output.

No. TestNet has no idea what happens after it is called. So it doesn’t know
that its output is being fed into CrossEntropyLoss and the fact that it is won’t
affect TestNet’s Softmax layer.

That is great to know! although I’ll double check that it’s not due to my Criterion setting. I’ll get back to this post once it’s verified that CrossEntropyLoss is interfering with the nn.Softmax layer. If not, then it will be a great mystery waiting to be solved haha.

Thanks again for the response!

Xinan

Update:
So I’ve run the same model again but on CPU, and everything behaves as expected with nn.Softmax

Which then prompt me to double-check my code, and realize that I didn’t push my model to GPU with model.to(device)
(I’ve set device to GPU)
After doing so, I’m getting the expected result. The weird thing is the model was able to even run in the first place as input and output are pushed to GPU memory, so they should be tesnor.cuda..., which should cause an object type mismatch when performing operations within the model. hmmm

Xinan