In this context dim refers to the dimension in which the softmax function will be applied.
>>> a = Variable(torch.randn(5,2))
>>> F.softmax(a, dim=1)
Variable containing:
0.6360 0.3640
0.3541 0.6459
0.2412 0.7588
0.0860 0.9140
0.6258 0.3742
[torch.FloatTensor of size 5x2]
>>> F.softmax(a, dim=0)
Variable containing:
0.6269 0.3177
0.0543 0.0877
0.1482 0.4128
0.0103 0.0969
0.1603 0.0849
[torch.FloatTensor of size 5x2]
On the first case (using dim=1) the softmax function is applied along the axis 1 . That’s why all rows add up to 1. On the second case (using dim=0) the softmax function is applied along the axis 0. Making all the columns add up to 1.