Applying Attention from paper

Suppose my Hidden audio representation shape is (after few CNN operations/layers)

```
H = torch.Size([128, 32, 64]) [Batch Size X FeatureDim X Length]
```

and I want to apply self-attention weights to the audio hidden frames as

```
A = softmax(ReLU(AttentionWeight1 * (AttentionWeight2 * H))
```

In order to learn these two self attention weight matrices. Do I need to register these two weights as Parameters in the init function like below

```
class Model(nn.Module):
def __init__(self, batch_size):
super(Model, self).__init__()
self.attention1 = nn.Parameter(torch.Tensor(self.batch_size,16, 32))
self.attention2 = nn.Parameter(torch.Tensor(self.batch_size,1, 16))
```

and in the forward do I need to do like this

```
def forward(self, input):
....
H = CNN(input) #[B X Features X length]
attention = nn.Softmax(nn.ReLU(torch.mm(self.attention2, torch.mm(self.attention1, H))
H = H*attention
return H
```

Please help. How can we apply attention here. A the above code is throwing error

```
RuntimeError: matrices expected, got 3D, 3D tensors at /opt/conda/conda-bld/pytorch_1591914985702/work/aten/src/TH/generic/THTensorMath.cpp:36
```