Tricky matrix multiplication

Konstantin_Suloev_Jr · December 1, 2022, 8:50pm

I am computing attention weights and i want to make it vectorized.
I have the following tensors:
input tensor is [batch_size, channels, x, y], weight tensor is [channels, channels] (to match the dimensions)
for each x and y and for each image i need to multiply a vector [channels] by weight tensor.
in my terms the shapes are [64, 256, 25, 2] and [256, 256]. I am willing to see output like [64, 256, 50].
Is it possible to do so using basic operations?

AlphaBetaGamma96 · December 1, 2022, 9:43pm

Hi @Konstantin_Suloev_Jr,

The following works, but check if the reshape function makes sense for your particular use-case.

x=torch.randn(64,256,25,2)
w=torch.randn(256,256)

out = torch.einsum("bcx,cd->bdx",x.reshape(64,256,50), w)
print(out.shape) #returns torch.Size([64, 256, 50])

Konstantin_Suloev_Jr · December 1, 2022, 9:48pm

Thank you for your answer! I bet i got another solution using nn module

Ws = nn.Linear(256, 256)
result = Ws(input.flatten(2).permute([0, 2, 1]))
it seems that it makes the same thing as you suggested

AlphaBetaGamma96 · December 1, 2022, 9:49pm

If Ws has its bias set to False, then yes. If not, then they’re different operations.

They’ll be equivalent (with bias = False), but I just merged the operations into a single einsum string.