def forward(self, input):
....
H = CNN(input) #[B X Features X length]
attention = nn.Softmax(nn.ReLU(torch.mm(self.attention2, torch.mm(self.attention1, H))
H = H*attention
return H
Please help. How can we apply attention here. A the above code is throwing error
RuntimeError: matrices expected, got 3D, 3D tensors at /opt/conda/conda-bld/pytorch_1591914985702/work/aten/src/TH/generic/THTensorMath.cpp:36
torch.mm expects two matrices (2D tensors), while you seem to use two 3D tensors.
You could use torch.bmm or torch.matmul instead, which would work for these tensors.
However, usually the parameters are not depending on the batch size.
Are you sure you want to initialize them with the batch size in dim0?
The code looks alright code-wise and you should be able to see valid gradients in model.attention1.grad and model.attention2.grad after a backward() call.
nn.Softmax should work like F.softmax, but you might have forgotten to create the module before calling it via:
nn.Softmax(dim=1)(input)
What kind of error are you seeing with nn.Softmax?
@shakeel608 Have you done your task ?
I am using a transformer network for my audio, ofcourse the encoder part only for multihead attention using key quey and value matrices.
Could you plz explain what is the purpose of this H at last? Is this only for rewighted H, for better classification.