Transformer attention padding masks

Hi,
I am pretty confused about how to create padding masks for attention between a query and key. Assume we have vectors for query and key that indicate valid data with 1 and padding with 0 (I know this is reverse to the usual convention). Then we want to create a matrix that is only 1 where both query AND key have valid data like so:
|1_1_0_1
1|1 1 0 1
0|0 0 0 0
1|1 1 0 1

If both of these vectors are batched then this (now 3d) tensor can be created with
torch.einsum(“bm,bn->bmn”, [query_padding, key_padding])

The problem is that I have never seen such an implementation anywhere. I only saw implementations where one of the vectors is just repeated like so:
|1_1_0_1
1|1 1 0 1
0|1 1 0 1
1|1 1 0 1
which in this case creates wrong entries in the 3rd row. Am I missing something here?