whats different between

dim=1 and dim=0

in softmax function , im new thanks for helping

whats different between

dim=1 and dim=0

in softmax function , im new thanks for helping

2 Likes

The `dim`

parameter dictates across which dimension the softmax operations is done. Basically, the `softmax`

operation will transform your input into a probability distribution *i.e.* the sum of all elements will be 1. I wrote this small example which shows the difference between using `dim=0`

or `dim=1`

for a 2D input tensor (supposing the first dimension for the batch size, and the second for the number of classes).

```
# input tensor of dimensions B x C, B = number of batches, C = number of classes.
inputs = torch.rand(size=(4, 4), dtype=torch.float32)
soft_dim0 = torch.softmax(inputs, dim=0)
soft_dim1 = torch.softmax(inputs, dim=1)
print('**** INPUTS ****')
print(inputs)
print('**** SOFTMAX DIM=0 ****')
print(soft_dim0)
print('**** SOFTMAX DIM=1 ****')
print(soft_dim1)
**** INPUTS ****
tensor([[0.1837, 0.5578, 0.0020, 0.8504],
[0.7583, 0.3940, 0.7474, 0.0036],
[0.5544, 0.8078, 0.4304, 0.7569],
[0.3422, 0.6562, 0.8809, 0.7006]])
**** SOFTMAX DIM=0 ****
tensor([[0.1853, 0.2361, 0.1418, 0.3124],
[0.3291, 0.2004, 0.2989, 0.1340],
[0.2684, 0.3031, 0.2177, 0.2846],
[0.2171, 0.2605, 0.3416, 0.2690]])
**** SOFTMAX DIM=1 ****
tensor([[0.1910, 0.2777, 0.1593, 0.3720],
[0.3171, 0.2202, 0.3136, 0.1491],
[0.2275, 0.2931, 0.2009, 0.2785],
[0.1814, 0.2483, 0.3108, 0.2595]])
```

As you can see, for the `softmax`

with `dim=0`

, the sum of each column =1, while for `dim=1`

, it is the sum of the rows that equals 1. Usually, you do not want to perform a `softmax`

operation across the batch dimension.

Hope this helps!

5 Likes

thanks

but what is the reason of using dimension ?

Tensors are multidimensional.

1 Like

is this the same as the `dim`

dimension in max (https://pytorch.org/docs/stable/generated/torch.max.html)?

Max doc (https://pytorch.org/docs/stable/generated/torch.max.html):

```
dim (int) – the dimension to reduce.
```

Softmax doc (https://pytorch.org/docs/master/generated/torch.nn.Softmax.html):

```
dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).
```

Basically does `dim`

always mean the same thing?

Yes, dim means the dimension, so its meaning is almost the same everywhere in PyTorch.

Like in the functioning of torch.chunk it is used to specify the dimension along which to split the tensor.

1 Like

I will try to expand on the already fantastic answer + example beaupreda already provided (especially if you are slightly dyxlexic like myself.

Consider the following example:

```
# input tensor of dimensions B x C, B = number of batches, C = number of classes.
B = 8
C = 3
inputs = torch.rand(size=(B, C))
soft_dim0 = torch.softmax(inputs, dim=0)
soft_dim1 = torch.softmax(inputs, dim=1)
print('**** INPUTS ****')
print(inputs)
print(inputs.size())
print('**** SOFTMAX DIM=0 ****')
print(soft_dim0)
print(f'soft_dim0[0, :].sum()={soft_dim0[0, :].sum()}')
print(f'soft_dim0[:, 0].sum()={soft_dim0[:, 0].sum()}')
print(soft_dim0.size())
# print('**** SOFTMAX DIM=1 ****')
# print(soft_dim1)
```

output:

```
**** INPUTS ****
tensor([[0.9424, 0.6841, 0.0430],
[0.9107, 0.8822, 0.2479],
[0.7422, 0.2052, 0.2464],
[0.7586, 0.5832, 0.5621],
[0.9490, 0.8187, 0.8626],
[0.6185, 0.3711, 0.3968],
[0.9245, 0.7323, 0.6658],
[0.6134, 0.9119, 0.1943]])
torch.Size([8, 3])
**** SOFTMAX DIM=0 ****
tensor([[0.1418, 0.1262, 0.0844],
[0.1374, 0.1539, 0.1035],
[0.1161, 0.0782, 0.1034],
[0.1180, 0.1141, 0.1418],
[0.1428, 0.1444, 0.1914],
[0.1026, 0.0923, 0.1202],
[0.1393, 0.1324, 0.1573],
[0.1021, 0.1585, 0.0981]])
soft_dim0[0, :].sum()=0.35238346457481384
soft_dim0[:, 0].sum()=1.0
torch.Size([8, 3])
```

For me what confuses is what does “across dimension X” means? (e.g. across `dim=0`

which are the columns) Does it mean we go across each element in the first dimension (e.g. across each row) and then apply the operation across the second dimension (e.g. across the columns for that row) or does it mean we go across each element of the second dimension (e.g. across each column) and then apply the operation across the first dimension (e.g. the rows). It means the second one. i.e. we apply the operation to all the elements across the dimension given. So in this case if we do `sf(x, dim=0) = y`

we get `y[:, d1] = sf(x[:,d1])`

. If we had a mean operation we’d have instead `mean(x, dim=0) = mu`

we get `mu[d1] = mean(x[:,d1])`

.

**So across dim=X means we do the operation wrt to the dimension given and the rest of the dimensions of the tensor stays as is**

Perhaps if we generalize it do an arbitrary tensor it might shed some like. For simplicity I will address a tensor of size 3 e.g. `size( [D0, D1, D2] )`

. In this case if we do `op(X,dim=1)`

we must have that we apply op across the dimension D1 something like `OP(X[d1, :, d2])`

. If that dimension disappears or not depends on the op but across dimension X means we apply the dimension across the elements given for the dimension of the tensor.

Hope this helps.

Another example using the cosine similarity might be helpful:

```
# cosine similarity
import torch.nn as nn
dim = 1 # apply cosine accross the second dimension/feature dimension
cos = nn.CosineSimilarity(dim=dim) # eps defaults to 1e-8 for numerical stability
k = 4 # number of examples
d = 8 # dimension
x1 = torch.randn(k, d)
x2 = x1 * 3
print(f'x1 = {x1.size()}')
cos_similarity_tensor = cos(x1, x2)
print(cos_similarity_tensor)
print(cos_similarity_tensor.size())
```

output:

```
x1 = torch.Size([4, 8])
tensor([1.0000, 1.0000, 1.0000, 1.0000])
torch.Size([4])
```

Do you know the reason why they use the number 0 and 1 for column and row wise? is it computer science thing or its in math in general?