Softmax always returns 1

Hello, I am running a Unet model with sigmoid as activation function and I am trying to get the softmax probabilites for each class. However, I am facing two problems:

First, the result of the softmax probability is always 1


tensor([[[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]],


        [[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]]])
tensor([[[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]],

Then, when trying to select the top two classes confidences for each sample,

toptwo = torch.topk(probabilities, 2, dim=1)[0]

I get

RuntimeError: selected index k out of range

Full code:

with torch.no_grad():
        
        with tqdm(enumerate(pool_loader)) as iterator:
            for i,batch in iterator:
                logits = model.forward(batch.to(device, dtype=torch.float)).cpu().detach()
                probabilities = F.softmax(logits, dim=1)
                print(probabilities)
            
            # Select the top two class confidences for each sample
            toptwo = torch.topk(probabilities, 2, dim=1)[0]

I suspect I am messing up somewhere in the batch, so I will leave here how my 2d batch looks like:

batch[1]

tensor([[[ 1.7694,  1.8379,  1.7865,  ...,  1.8208,  1.7865,  1.7865],
         [ 1.7523,  1.8550,  1.8208,  ...,  1.8208,  1.7694,  1.7865],
         [ 1.7694,  1.8379,  1.8208,  ...,  1.7865,  1.7865,  1.7865],
         ...,
         [-0.2171, -0.2342, -0.2171,  ...,  1.1700,  1.1015,  1.1015],
         [-0.1486, -0.2171, -0.1828,  ...,  1.0159,  1.0502,  0.9817],
         [-0.1486, -0.1657, -0.1486,  ...,  0.9988,  0.9817,  1.0159]],

        [[ 1.8683,  1.9209,  1.8683,  ...,  1.8683,  1.8859,  1.8859],
         [ 1.9034,  1.8859,  1.9034,  ...,  1.8859,  1.8859,  1.8859],
         [ 1.8683,  1.9034,  1.8859,  ...,  1.8859,  1.8859,  1.8859],
         ...,
         [-0.8452, -0.8627, -0.8277,  ...,  0.9230,  0.8354,  0.7654],
         [-0.7577, -0.8452, -0.8102,  ...,  0.8179,  0.7479,  0.6254],
         [-0.7052, -0.7402, -0.7577,  ...,  0.7654,  0.6954,  0.7129]],

        [[ 2.2391,  2.2740,  2.2391,  ...,  2.2391,  2.2391,  2.2566],
         [ 2.2391,  2.2740,  2.2566,  ...,  2.2391,  2.2391,  2.2391],
         [ 2.2217,  2.2566,  2.2566,  ...,  2.2391,  2.2566,  2.2217],
         ...,
         [ 0.6705,  0.6008,  0.6879,  ...,  1.8557,  1.8208,  1.7685],
         [ 0.8274,  0.7054,  0.7402,  ...,  1.8383,  1.8208,  1.7163],
         [ 0.8099,  0.7925,  0.7925,  ...,  1.8383,  1.8034,  1.8208]]],
       dtype=torch.float64)

batch[0]:

tensor([[[ 1.7694,  1.8379,  1.7865,  ...,  1.8208,  1.7865,  1.7865],
         [ 1.7523,  1.8550,  1.8208,  ...,  1.8208,  1.7694,  1.7865],
         [ 1.7694,  1.8379,  1.8208,  ...,  1.7865,  1.7865,  1.7865],
         ...,
         [-0.2171, -0.2342, -0.2171,  ...,  1.1700,  1.1015,  1.1015],
         [-0.1486, -0.2171, -0.1828,  ...,  1.0159,  1.0502,  0.9817],
         [-0.1486, -0.1657, -0.1486,  ...,  0.9988,  0.9817,  1.0159]],

        [[ 1.8683,  1.9209,  1.8683,  ...,  1.8683,  1.8859,  1.8859],
         [ 1.9034,  1.8859,  1.9034,  ...,  1.8859,  1.8859,  1.8859],
         [ 1.8683,  1.9034,  1.8859,  ...,  1.8859,  1.8859,  1.8859],
         ...,
         [-0.8452, -0.8627, -0.8277,  ...,  0.9230,  0.8354,  0.7654],
         [-0.7577, -0.8452, -0.8102,  ...,  0.8179,  0.7479,  0.6254],
         [-0.7052, -0.7402, -0.7577,  ...,  0.7654,  0.6954,  0.7129]],

        [[ 2.2391,  2.2740,  2.2391,  ...,  2.2391,  2.2391,  2.2566],
         [ 2.2391,  2.2740,  2.2566,  ...,  2.2391,  2.2391,  2.2391],
         [ 2.2217,  2.2566,  2.2566,  ...,  2.2391,  2.2566,  2.2217],
         ...,
         [ 0.6705,  0.6008,  0.6879,  ...,  1.8557,  1.8208,  1.7685],
         [ 0.8274,  0.7054,  0.7402,  ...,  1.8383,  1.8208,  1.7163],
         [ 0.8099,  0.7925,  0.7925,  ...,  1.8383,  1.8034,  1.8208]]],
       dtype=torch.float64)

Any help would be amazing, Ive been trying to understand what is wrong for hours.

Regards

From what I read, the best approach is to apply softmax to raw logits. Since I have sigmoid as an activation function, how do I apply softmax to get each class probability?

Hi Luis!

Something is very fishy here. I don’t believe it is possible to have
softmax() return all 1s. (At least it shouldn’t be.)

Could you print out logits.shape, logits.min(), logits.max(),
probabilities.shape, probabilities.min(), and
probabilities.max()?

Also, what version of pytorch are you using?

Best.

K. Frank

1 Like

Thank you for your reply.

I think that may be happening because my output are not raw logits and I use sigmoid as activation function when training. It is a 2 class problem, though.

The shape of logits is: torch.Size([2, 1, 256, 256])

Well, it seems I messed up and I was getting the batch. I tried to use logits[0] with softmax but now the results is always the same:

tensor([[[0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
         [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
         [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
         ...,
         [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
         [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
         [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039]]],
       device='cuda:0')

torch.Size([1, 256, 256])

Hi Luis!

Note that 0.0039 is equal to 1 / 256 (rounded to two significant digits).
The tensor you are passing to softmax() (presumably logits) consists
of elements that all have the same value (at least along the dimension
across which you compute softmax()). So softmax() says that each
of your 256 classes has the same probability, namely 1 / 256.

Best.

K. Frank

Well, that makes sense. Do you have any idea what am I doing wrong?

The output from the model, of one image is:

tensor([[3.8711e-05, 4.7626e-06, 7.6847e-07,  ..., 1.1946e-06, 2.9188e-06,
         1.8121e-06],
        [1.7615e-06, 3.0756e-06, 2.0859e-06,  ..., 5.7203e-06, 7.6894e-06,
         3.1614e-06],
        [1.8942e-06, 2.1382e-06, 1.1094e-06,  ..., 2.8606e-06, 5.4414e-06,
         1.8292e-06],
        ...,
        [6.1765e-06, 1.1547e-06, 7.4945e-07,  ..., 8.2501e-07, 6.4901e-07,
         3.4714e-06],
        [2.3444e-06, 1.0644e-06, 1.6311e-06,  ..., 3.1044e-06, 3.0752e-06,
         1.4126e-06],
        [3.0594e-05, 6.4542e-07, 6.8967e-06,  ..., 3.6540e-06, 2.4553e-06,
         8.3999e-06]])

with this sape: torch.Size([256, 256])

this what I get doing logits[0].shape. I have no idea what I am doing wrong and been struggling with this all afternoon. Any help is highly appreciated.

Hi Luis!

No, not really.

I also don’t have any idea what you are trying to do.

Could you explain in a few sentences the problem you are working
on? What is your input data? It sounds like you are working on a
classification problem. What are your class labels?

Well, this can’t be the tensor you were passing through softmax()
that gave you a result tensor whose elements were all 0.0039.

Here’s one point of confusion:

Does this mean that [256, 256] is the shape of the output of your
model? Or is [ 256, 256] the shape of output_of_your_model[0]?

What is your batch size? What is the shape of the actual output of your
model? What is your loss function? What is the shape of your labels?

Best.

K. Frank

Oh, sorry. I will try to explain then.

My input is 256x256 images and this is a segmentation problem. I implementing an active learning pipeline, therefore I am trying to get the probabilities and then get the top two classes confidences for each sample. Basically, the margin sampling query strategy.
I created a pool loader, that samples my test_dataset with random indices:

pool_idx = random.sample(range(1, len(test_dataset)), pool_size)
    pool_loader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers,
                                          #sampler=SubsetRandomSampler(unlabeled_idx[pool_idx]))
                                          sampler=SubsetRandomSampler(pool_idx))

Where the batch size is equal to 2.

Then I create an iterator to go through my pool_loader and get the prediction for each sample:



with tqdm(enumerate(test_loader)) as iterator:
    for i,batch in iterator:
        outputs=model.forward(batch.to(DEVICE, dtype=torch.float)).cpu().detach()

And the result is what I told you, the 0.0039 tensor.
I just want the softmax with the probabilities for each class and then calculate the margin.

Note that the activation function of my model (Unet) is sigmoid. I am afraid that applying softmax on top of sigmoid might be complicating things a bit. This is also a two class problem (Positive/Negative).

I hope this gives a better comprehension of my problem.
Thank you very much for your help!

Hi Luis!

First off, you didn’t answer my specific questions. You’ve said a
number of inconsistent things, which makes it hard to guess what’s
going on.

Let me make some comments based on what I think you might be
trying to do.

You are trying to segment images. That means that you assign to
each pixel in the image a class label. You say this is a two-class
problem. Therefore you are labelling each pixel as either a “Positive”
pixel or a “Negative” pixel. (This could be, for example, “foreground”
vs. “background” pixels.)

You say that your images are 256x256 pixels, and that your batch
size is 2. Therefore I would expect that the input to your model is
a tensor that holds a batch of two images, and therefore has shape
[2, 256, 256].

You have been unclear about the shape of the output of your model,
but you have said more often than not that it is [2, 1, 256, 256].
This mostly makes sense, but it is unclear why you have that dimension
of size 1. Let me ignore this size-1 dimension, and think of your output
as having shape [2, 256, 256]. So for each sample in your batch,
your output has the shape of an input image, namely, [256, 256].

Normally for a multiclass segmentation problem with nClass classes,
you would have nClass outputs for each pixel – either raw-score
logits or probabilities for that pixel to be in each of the nClass
classes. But for your two-class, binary segmentation problem, it is
much more common to have only a single output value per pixel, and
this value is either the raw-score logit or the probability of that pixel
being in the “Positive” class. (You only need one value because the
probability of being in the “Negative” class is simply 1 minus the
probability of being in the “Positive” class.)

You say that the “activation function of my model (Unet) is sigmoid.”
The sigmoid() function converts raw-score logits (that run from
-inf to inf) into probabilities (that run from 0.0 to 1.0). So the
output of your model should be (for each sample image in your batch)
an “image” of 256x256 “pixels,” each of which is the (predicted)
probability of the corresponding pixel in the input image being in the
“Positive” class.

Based on this, all talk of using softmax() to get probabilities is
confused. softmax() is used to convert a set of nClass logits in
a multiclass problem into a set of nClass probabilities that sum
to 1.0. My guess is that you have been trying to apply softmax()
across horizontal rows of (output) pixels (as if you had a 256-class
problem) and are getting garbage. Assuming that a final sigmoid()
is built into your model, your (output) pixels are already binary
classification probabilities – that is the probability of the input pixel
being in the “Positive” class, with no explicit “Negative” class probability
being given.

Also, based on this, the notion of getting “the top two classes
confidences” doesn’t make sense. You only have two classes, so
your top two classes will always be “Positive” and “Negative.”

Coming back to

If the shape of probabilities really is [2, 1, 256, 256], then this
error message makes sense. To avoid confusion, note that pytorch
tensors are zero-based, that is, the indices start at 0. So in the
topk call, dim = 0 would mean the first (in one-based counting)
dimension, that is, the nBatch dimension with size 2. dim = 1
means the second dimension, that is, the size-1 dimension. (And
dim = 2 and dim = 3 would be the width and height dimensions,
both of size 256.)

So your topk call is asking for the 2 largest values along a size-1
dimension, that is, along a dimension that only has 1 value.

I suspect that you are somehow mixing together code and concepts
for a multiclass segmentation problem with those for a binary
segmentation problem.

Good luck.

K. Frank