Multiply and accumulate two tensors across batch size

I have two tensors with below size

x = torch.Size([10, 3, 128]) → [batch_size, no_of_IDs, no_events_per_ID]
attention_value = torch.Size([10, 3, 1]) → [batch_size, no_of_IDs, att_value]

How to mac (multiply and accumulate) these 3D tensors across batch_size i.e. “x with attention_value”

When tried: torch.matmul(x, att_value) giving below error

RuntimeError: Expected batch2_sizes[0] == bs && batch2_sizes[1] == contraction_size to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Output size should be [10, 128]

Hi Pradeep!

Based on the tensor shapes you give, I assume that you want to multiply
each matrix in the batch x by the corresponding matrix in the batch
attention_value (matrix-multiply, not element wise).

I assume that you are not somehow summing across elements of the
batches, so I’m not sure what you mean by “accumulate.”

You can use bmm() (“batch matrix multiply”) after using transpose()
to line the dimensions up correctly (and then use squeeze() to get rid
of the singleton dimension):

>>> import torch
>>> torch.__version__
'1.10.2'
>>> x = torch.randn (10, 3, 5)   # using 5 instead of 128
>>> attention_value = torch.randn (10, 3, 1)
>>> torch.bmm (x.transpose (1, 2), attention_value).squeeze()
tensor([[ 2.3298,  0.5740,  2.4056,  1.2855, -0.7073],
        [-0.0287, -4.0949, -0.5672,  1.7701,  1.9519],
        [ 0.6117,  0.1664, -1.3646, -1.6738, -1.7130],
        [ 0.6985, -0.3951, -0.5439, -0.5091, -0.2386],
        [ 2.7237, -0.8359, -0.4583,  1.1985,  3.0083],
        [ 1.1887, -0.4801,  1.3998, -1.3990,  2.6126],
        [ 1.2119,  0.2461,  1.8558, -0.8090,  2.0118],
        [-0.2383,  1.4226, -0.2152,  0.9591,  1.5240],
        [-0.4384,  0.0184, -0.9334,  0.0796, -0.6989],
        [ 0.7386, -0.6474,  0.9869, -0.5547, -0.0543]])

Best.

K. Frank

Thank you Frank. Was able to solve the logic with below steps…

Perfoming permute on x[10, 3, 128]

x = torch.permute(x, (0, 2, 1)) => x[10, 128, 3] and att_value[10, 3, 1]

And perform
context_vector = torch.matmual(x, att_value) → [10, 128, 1] followed by
torch.squeeze(context_veactor, 2) → [10, 128]

Will try your method as well.
Thank you for your time and details.