Currently I am trying to train the official pytorch version of Transformer in nn module. But I found the tutorial in the comments is not using the nn.Transformer module, and nearly all code using pytorch version of transformer or transformer encoder/decoder are running the model once per iteration.
My question is how should I write train()/evaluate() function to train/evaluate the official nn.Transformer module, what should I pass into the model as tgt in the evaluate/test time?(I think the transformer model should repeat (len_output) times and use each output as the next tgt)
Also, what 2D mask size should I create if the max length of my input and output sequence is different?( (in, out) or (out, out) )