After reading the official pytorch documentation for transformer, I think that I fully understand how the src_mask and src_key_padding_mask work.
It seems that the src_mask (S,S) decides which elements should be attended to per each element, and it is shared among all batches.
And the src_key_padding_mask (N,S) aims to apply different masks for each batch.
However, how can I apply (N,S,S) dimension mask?
I want to apply different masks for each element of each batch.
Maybe it is not implemented in the official version of pytorch transformer…
Any help will be greatly appreciated.