Confusion about torch.nn.Transformer

hi, I’m a bit confusing with src_mask and src_key_padding_mask, the explanation on pytorch docs are
src_mask – the additive mask for the src sequence (optional).
src_key_padding_mask – the ByteTensor mask for src keys per batch (optional).
In my opinion, src_mask 's dimension is (S,S), and S is the max source length in batch, so i need to send input src_mask (N,S,S) to the Transformer.I don’t know if i understand that correctly. I don’t understand the src_key_padding_mask’s explanation on website docs, this is confusing me.
for the provided example code ,
output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)
set the [src/tgt/memory]_key_padding_mask are None as default, I’m a little confused about this operation.

1 Like