in nn.Transformer the method “generate_square_subsequent_mask” outputs a square matrix with the first column with all 0, second column with -inf and all 0, and so on.
if we are working column wise (ie the input is SEQ_LEN, BATCH_SIZE, E_DIM) shouldn’t it be transposed?