I am computing attention weights and i want to make it vectorized.

I have the following tensors:

input tensor is [batch_size, channels, x, y], weight tensor is [channels, channels] (to match the dimensions)

for each x and y and for each image i need to multiply a vector [channels] by weight tensor.

in my terms the shapes are [64, 256, 25, 2] and [256, 256]. I am willing to see output like [64, 256, 50].

Is it possible to do so using basic operations?

The following works, but check if the reshape function makes sense for your particular use-case.

```
x=torch.randn(64,256,25,2)
w=torch.randn(256,256)
out = torch.einsum("bcx,cd->bdx",x.reshape(64,256,50), w)
print(out.shape) #returns torch.Size([64, 256, 50])
```

Thank you for your answer! I bet i got another solution using nn module

Ws = nn.Linear(256, 256)

result = Ws(input.flatten(2).permute([0, 2, 1]))

it seems that it makes the same thing as you suggested

If `Ws`

has its bias set to False, then yes. If not, then they’re different operations.

They’ll be equivalent (with `bias = False`

), but I just merged the operations into a single einsum string.