I know that the attention code uses mutuml like this and that this code works
But using matmul under the exact same conditions gives an error Why is this?
I want to control dim to account for batch and attention_haed, but how do I control the calculated dimension of matmul?
attencode and print output
attn = self.dropout(attn)
print("attn",attn.size())
print(value.size())
context = torch.matmul(attn, value).transpose(1, 2)
context = context.contiguous().view(batch_size, -1, self.d_model)
attn torch.Size([4, 16, 100, 100])
torch.Size([4, 16, 100, 32])
my test code and error
v=torch.randn(4, 16, 100, 100)
at=torch.randn(4, 16, 100,32)
print("aa")
context = torch.matmul(at, v)
pritn(context.size())
Traceback (most recent call last):
File "a5atten.py", line 129, in <module>
context = torch.matmul(at, v)
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [64, 32] but got: [64, 100].