Whats different between dim=1 and dim=0

hunar · November 15, 2019, 2:30pm

whats different between

dim=1 and dim=0

in softmax function , im new thanks for helping

beaupreda · November 15, 2019, 3:22pm

The dim parameter dictates across which dimension the softmax operations is done. Basically, the softmax operation will transform your input into a probability distribution i.e. the sum of all elements will be 1. I wrote this small example which shows the difference between using dim=0 or dim=1 for a 2D input tensor (supposing the first dimension for the batch size, and the second for the number of classes).

    # input tensor of dimensions B x C, B = number of batches, C = number of classes.
    inputs = torch.rand(size=(4, 4), dtype=torch.float32)
    soft_dim0 = torch.softmax(inputs, dim=0)
    soft_dim1 = torch.softmax(inputs, dim=1)
    print('**** INPUTS ****')
    print(inputs)
    print('**** SOFTMAX DIM=0 ****')
    print(soft_dim0)
    print('**** SOFTMAX DIM=1 ****')
    print(soft_dim1)

**** INPUTS ****
tensor([[0.1837, 0.5578, 0.0020, 0.8504],
        [0.7583, 0.3940, 0.7474, 0.0036],
        [0.5544, 0.8078, 0.4304, 0.7569],
        [0.3422, 0.6562, 0.8809, 0.7006]])
**** SOFTMAX DIM=0 ****
tensor([[0.1853, 0.2361, 0.1418, 0.3124],
        [0.3291, 0.2004, 0.2989, 0.1340],
        [0.2684, 0.3031, 0.2177, 0.2846],
        [0.2171, 0.2605, 0.3416, 0.2690]])
**** SOFTMAX DIM=1 ****
tensor([[0.1910, 0.2777, 0.1593, 0.3720],
        [0.3171, 0.2202, 0.3136, 0.1491],
        [0.2275, 0.2931, 0.2009, 0.2785],
        [0.1814, 0.2483, 0.3108, 0.2595]])

As you can see, for the softmax with dim=0, the sum of each column =1, while for dim=1, it is the sum of the rows that equals 1. Usually, you do not want to perform a softmax operation across the batch dimension.

Hope this helps!

hunar · November 16, 2019, 9:19am

thanks
but what is the reason of using dimension ?

alx · November 16, 2019, 4:14pm

Tensors are multidimensional.

Brando_Miranda · August 4, 2020, 7:39pm

is this the same as the dim dimension in max (https://pytorch.org/docs/stable/generated/torch.max.html)?

Max doc (https://pytorch.org/docs/stable/generated/torch.max.html):

dim (int) – the dimension to reduce.

Softmax doc (https://pytorch.org/docs/master/generated/torch.nn.Softmax.html):

dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).

Basically does dim always mean the same thing?

Karthik_J · December 4, 2020, 2:46pm

Yes, dim means the dimension, so its meaning is almost the same everywhere in PyTorch.
Like in the functioning of torch.chunk it is used to specify the dimension along which to split the tensor.

Brando_Miranda · February 3, 2021, 5:38pm

I will try to expand on the already fantastic answer + example beaupreda already provided (especially if you are slightly dyxlexic like myself.

Consider the following example:

# input tensor of dimensions B x C, B = number of batches, C = number of classes.
B = 8
C = 3
inputs = torch.rand(size=(B, C))
soft_dim0 = torch.softmax(inputs, dim=0)
soft_dim1 = torch.softmax(inputs, dim=1)
print('**** INPUTS ****')
print(inputs)
print(inputs.size())
print('**** SOFTMAX DIM=0 ****')
print(soft_dim0)
print(f'soft_dim0[0, :].sum()={soft_dim0[0, :].sum()}')
print(f'soft_dim0[:, 0].sum()={soft_dim0[:, 0].sum()}')
print(soft_dim0.size())
# print('**** SOFTMAX DIM=1 ****')
# print(soft_dim1)

output:

**** INPUTS ****
tensor([[0.9424, 0.6841, 0.0430],
        [0.9107, 0.8822, 0.2479],
        [0.7422, 0.2052, 0.2464],
        [0.7586, 0.5832, 0.5621],
        [0.9490, 0.8187, 0.8626],
        [0.6185, 0.3711, 0.3968],
        [0.9245, 0.7323, 0.6658],
        [0.6134, 0.9119, 0.1943]])
torch.Size([8, 3])
**** SOFTMAX DIM=0 ****
tensor([[0.1418, 0.1262, 0.0844],
        [0.1374, 0.1539, 0.1035],
        [0.1161, 0.0782, 0.1034],
        [0.1180, 0.1141, 0.1418],
        [0.1428, 0.1444, 0.1914],
        [0.1026, 0.0923, 0.1202],
        [0.1393, 0.1324, 0.1573],
        [0.1021, 0.1585, 0.0981]])
soft_dim0[0, :].sum()=0.35238346457481384
soft_dim0[:, 0].sum()=1.0
torch.Size([8, 3])

For me what confuses is what does “across dimension X” means? (e.g. across dim=0 which are the columns) Does it mean we go across each element in the first dimension (e.g. across each row) and then apply the operation across the second dimension (e.g. across the columns for that row) or does it mean we go across each element of the second dimension (e.g. across each column) and then apply the operation across the first dimension (e.g. the rows). It means the second one. i.e. we apply the operation to all the elements across the dimension given. So in this case if we do sf(x, dim=0) = y we get y[:, d1] = sf(x[:,d1]). If we had a mean operation we’d have instead mean(x, dim=0) = mu we get mu[d1] = mean(x[:,d1]).

So across dim=X means we do the operation wrt to the dimension given and the rest of the dimensions of the tensor stays as is

Perhaps if we generalize it do an arbitrary tensor it might shed some like. For simplicity I will address a tensor of size 3 e.g. size( [D0, D1, D2] ). In this case if we do op(X,dim=1) we must have that we apply op across the dimension D1 something like OP(X[d1, :, d2]). If that dimension disappears or not depends on the op but across dimension X means we apply the dimension across the elements given for the dimension of the tensor.

Hope this helps.

Another example using the cosine similarity might be helpful:

# cosine similarity

import torch.nn as nn

dim = 1  # apply cosine accross the second dimension/feature dimension
cos = nn.CosineSimilarity(dim=dim)  # eps defaults to 1e-8 for numerical stability

k = 4  # number of examples
d = 8 # dimension
x1 = torch.randn(k, d)
x2 = x1 * 3
print(f'x1 = {x1.size()}')
cos_similarity_tensor = cos(x1, x2)
print(cos_similarity_tensor)
print(cos_similarity_tensor.size())

output:

x1 = torch.Size([4, 8])
tensor([1.0000, 1.0000, 1.0000, 1.0000])
torch.Size([4])

Chawza · February 6, 2021, 12:16pm

Do you know the reason why they use the number 0 and 1 for column and row wise? is it computer science thing or its in math in general?