Dimension for logsoftmax

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.l1 = nn.Linear(50*50,100)
        self.l2 = nn.ReLU()
        self.l3 = nn.Linear(100,100)
        self.l4 = nn.Tanh()        
        self.l5 = nn.Linear(100,10) 
        self.l6 = nn.LogSoftmax()

Having module M, if I don’t set
self.l6 = nn.LogSoftmax(dim=0)
or
self.l6 = nn.LogSoftmax(dim=1)

I get the warning:

UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.

What does it mean to set dim=0 and what dim=1?

1 Like

The dim argument defines which dimension should be used to calculate the log softmax, i.e. in which dimension the class logits are located.
Have a look at this small example using softmax:

x = torch.randn(5, 3)
x0 = F.softmax(x, dim=0)
print(x0)
> tensor([[0.1313, 0.0170, 0.4122],
        [0.0167, 0.6336, 0.0440],
        [0.1764, 0.0804, 0.3689],
        [0.4540, 0.0501, 0.0967],
        [0.2217, 0.2189, 0.0782]])
print(x0.sum(0))
> tensor([1.0000, 1.0000, 1.0000])

x1 = F.softmax(x, dim=1)
print(x1)
> tensor([[0.2528, 0.0482, 0.6990],
        [0.0169, 0.9438, 0.0393],
        [0.2847, 0.1908, 0.5245],
        [0.7409, 0.1202, 0.1389],
        [0.3620, 0.5255, 0.1124]])
print(x1.sum(1))
> tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000])

As you can see, the sum of all probabilities will be 1. in the specified dimension.

In your use case, you should use dim=1 to calculate the log probabilities for each sample in the batch over all classes (which are in dim1).

11 Likes

OK, just to clarify,

the dimension 0 would be the number of batches (5 in your case),
dimension 1 is number of samples in the batch (3 in your case)
so I set dim=1 to calculate logits (log probabilities) of each sample in the batch).

Dimension 0 is the batch dimension and gives the number of samples in the current batch.
You can consider x being a single batch of samples.
Dimension 1 is the feature dimension in my use case and gives the number of different features.

1 Like

Hi,
I have a question. What I understood from this question is that the softmax layer is a classifier of 10 classes. So when you are saying - for this case dim = 1 is to be used- so can I generalize that for classification problem one should use dim =1 or I am understanding wrong.
Could you please give an example where dim =0 can be used.
Thanks in advance.

The majority of PyTorch layers use tensors with the batch dimension in dim0.
The typical multi-class classification output would have a shape of [batch_size, nb_classes], and you would calculate the probability for each class in each sample:

batch_size = 2
nb_classes = 3
x = torch.randn(batch_size, nb_classes)
prob = F.softmax(x, dim=1)
print(prob)
> tensor([[0.6935, 0.1843, 0.1223],
          [0.8212, 0.0705, 0.1083]])

Here you can see that for sample0, class0 has a probability of 69.35%, class1 18.43%, and class2 12.23%.

If you are using F.softmax or F.log_softmax with dim=0, you would calculate the (log) probability in the batch dimension.

prob = F.softmax(x, dim=0)
print(prob)
> tensor([[0.2748, 0.5397, 0.3364],
          [0.7252, 0.4603, 0.6636]])

Now you are looking at: for class0, sample0 has a probability of 27.48%, while sample1 has 72.52%.

RNNs are an exception and are using the temporal dimension in dim0, so it might depend on your use case, if you want to apply the (log)softmax in this dimension.

2 Likes

Thank you ptrblck for such a clear explanation. I am learning so much from this forum.
Regards,
ananda2020

1 Like