nn.Transformer attention mask transposed?

in nn.Transformer the method “generate_square_subsequent_mask” outputs a square matrix with the first column with all 0, second column with -inf and all 0, and so on.

if we are working column wise (ie the input is SEQ_LEN, BATCH_SIZE, E_DIM) shouldn’t it be transposed?

Sorry, just went through the code. It’s okey the way it is

1 Like