I’m implementing training codes of transformer model using nn.Transformer.
In the documents, there is a memory_mask optional argument. I read the document but I don’t understand the purpose of this argument.
Could you explain what memory_mask is?
Additionally, is there any code that uses nn.Transformer module?