# Dimension for logsoftmax

``````class M(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(50*50,100)
self.l2 = nn.ReLU()
self.l3 = nn.Linear(100,100)
self.l4 = nn.Tanh()
self.l5 = nn.Linear(100,10)
self.l6 = nn.LogSoftmax()
``````

Having module M, if I don’t set
`self.l6 = nn.LogSoftmax(dim=0)`
or
`self.l6 = nn.LogSoftmax(dim=1)`

I get the warning:

UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.

What does it mean to set `dim=0` and what `dim=1`?

1 Like

The `dim` argument defines which dimension should be used to calculate the log softmax, i.e. in which dimension the class logits are located.
Have a look at this small example using softmax:

``````x = torch.randn(5, 3)
x0 = F.softmax(x, dim=0)
print(x0)
> tensor([[0.1313, 0.0170, 0.4122],
[0.0167, 0.6336, 0.0440],
[0.1764, 0.0804, 0.3689],
[0.4540, 0.0501, 0.0967],
[0.2217, 0.2189, 0.0782]])
print(x0.sum(0))
> tensor([1.0000, 1.0000, 1.0000])

x1 = F.softmax(x, dim=1)
print(x1)
> tensor([[0.2528, 0.0482, 0.6990],
[0.0169, 0.9438, 0.0393],
[0.2847, 0.1908, 0.5245],
[0.7409, 0.1202, 0.1389],
[0.3620, 0.5255, 0.1124]])
print(x1.sum(1))
> tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
``````

As you can see, the sum of all probabilities will be 1. in the specified dimension.

In your use case, you should use `dim=1` to calculate the log probabilities for each sample in the batch over all classes (which are in `dim1`).

11 Likes

OK, just to clarify,

the dimension 0 would be the number of batches (5 in your case),
dimension 1 is number of samples in the batch (3 in your case)
so I set `dim=1` to calculate logits (log probabilities) of each sample in the batch).

Dimension 0 is the batch dimension and gives the number of samples in the current batch.
You can consider `x` being a single batch of samples.
Dimension 1 is the feature dimension in my use case and gives the number of different features.

1 Like

Hi,
I have a question. What I understood from this question is that the softmax layer is a classifier of 10 classes. So when you are saying - for this case dim = 1 is to be used- so can I generalize that for classification problem one should use dim =1 or I am understanding wrong.
Could you please give an example where dim =0 can be used.

The majority of PyTorch layers use tensors with the batch dimension in `dim0`.
The typical multi-class classification output would have a shape of `[batch_size, nb_classes]`, and you would calculate the probability for each class in each sample:

``````batch_size = 2
nb_classes = 3
x = torch.randn(batch_size, nb_classes)
prob = F.softmax(x, dim=1)
print(prob)
> tensor([[0.6935, 0.1843, 0.1223],
[0.8212, 0.0705, 0.1083]])
``````

Here you can see that for sample0, class0 has a probability of 69.35%, class1 18.43%, and class2 12.23%.

If you are using `F.softmax` or `F.log_softmax` with `dim=0`, you would calculate the (log) probability in the batch dimension.

``````prob = F.softmax(x, dim=0)
print(prob)
> tensor([[0.2748, 0.5397, 0.3364],
[0.7252, 0.4603, 0.6636]])
``````

Now you are looking at: for class0, sample0 has a probability of 27.48%, while sample1 has 72.52%.

RNNs are an exception and are using the temporal dimension in `dim0`, so it might depend on your use case, if you want to apply the (log)softmax in this dimension.

2 Likes

Thank you ptrblck for such a clear explanation. I am learning so much from this forum.
Regards,
ananda2020

1 Like