So I have matrix of size n x m, say, mat = torch.rand(n, m) and i want to calculate the softmax over the second dimension:

```
exp_mat = torch.exp(mat)
soft_max_mat = exp_mat/(exp_mat.sum(1).unsqueeze(1).repeat(1, exp_mat.size(1))
```

but this is too slow, even on gpus. I believe there should be some workaround like vectorization in pytorch. What is it exactly?