I’m building modules for different reading comprehension models, BiDAF, DCN, etc.

I wanted to confirm if the following two sets of code are equivalent. The goal is to apply softmax on a similarity matrix, L, of shape (B, M+1, N+1) and calculate alpha, beta.

Where alpha and beta are defined by the following equations:

**Approach 1:**

```
alpha = F.softmax(L, dim=2)
beta = F.softmax(L, dim=1)
beta = beta.transpose(1, 2)
```

**Approach 2:**

```
alpha, beta = [ ], [ ]
for i in range(L.size(0)):
alpha.append(F.softmax(L[i],1).unsqueeze(0))
beta.append(F.softmax(L[i].transpose(0,1),1).unsqueeze(0))
alpha = torch.cat(alpha, dim=-1)
beta = torch.cat(beta, dim=-1)
```

I think the first approach is more efficient, as it doesn’t use a **for loop**.

Would really appreciate any advice on this!