I want to implement a typical attention mechanism and I need to compute the dot product between a sequence of vectors and a query vector. I was wondering, which is the best way to implement this operation with batched data.
Thanks. What is the difference between torch.matmul(sequence, query.unsqueeze(2)) and torch.bmm(sequence,query.unsqueeze(2))? I get the same results, but performance-wise is there any difference?
In thory matmal supports brodcasting so it shouldn’t make copies of the tensor (more memory efficent).
There was a bug report filed for that recently, not sure if it was fixed.
You can time a for loop on both and see what you get. I think the timing will be the same but the memory consumption will be less for matmul (important for big matrices)