Batched matrix-vector dot product (attention)

I want to implement a typical attention mechanism and I need to compute the dot product between a sequence of vectors and a query vector. I was wondering, which is the best way to implement this operation with batched data.

Suppose that I have the following data:

import torch

batch_size = 32
seq_length = 50
dim = 100

sequence = torch.randn(batch_size, seq_length, dim)
query = torch.randn(batch_size, dim)

What i need as an output is a tensor with dimenionality (batch_size, seq_length).

What i am doing now is this:

energies = torch.matmul(sequence, query.unsqueeze(2))

I think torch.bmm does what you want. But you’ll need to unsqueeze the query first.

query = query.unsqueeze(2)
result = torch.bmm(sequence,query)
and shape of result would be : batch_size, seq_length,1

Thanks. What is the difference between torch.matmul(sequence, query.unsqueeze(2)) and torch.bmm(sequence,query.unsqueeze(2))? I get the same results, but performance-wise is there any difference?

In thory matmal supports brodcasting so it shouldn’t make copies of the tensor (more memory efficent).

There was a bug report filed for that recently, not sure if it was fixed.

You can time a for loop on both and see what you get. I think the timing will be the same but the memory consumption will be less for matmul (important for big matrices)