How to mac (multiply and accumulate) these 3D tensors across batch_size i.e. “x with attention_value”
When tried: torch.matmul(x, att_value) giving below error
RuntimeError: Expected batch2_sizes[0] == bs && batch2_sizes[1] == contraction_size to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
Based on the tensor shapes you give, I assume that you want to multiply
each matrix in the batch x by the corresponding matrix in the batch attention_value (matrix-multiply, not element wise).
I assume that you are not somehow summing across elements of the
batches, so I’m not sure what you mean by “accumulate.”
You can use bmm() (“batch matrix multiply”) after using transpose()
to line the dimensions up correctly (and then use squeeze() to get rid
of the singleton dimension):