How to properly train and evaluate nn.Transformer module?

Hi all,

Currently I am trying to train the official pytorch version of Transformer in nn module. But I found the tutorial in the comments is not using the nn.Transformer module, and nearly all code using pytorch version of transformer or transformer encoder/decoder are running the model once per iteration.

My question is how should I write train()/evaluate() function to train/evaluate the official nn.Transformer module, what should I pass into the model as tgt in the evaluate/test time?(I think the transformer model should repeat (len_output) times and use each output as the next tgt)
Also, what 2D mask size should I create if the max length of my input and output sequence is different?( (in, out) or (out, out) )


1 Like

I found a tutorial from pytorch here, which use ground truth at training and evaluation time and use greedy decode at inference time. I wonder if that is a valid way to evaluate the model?

1 Like

I have the same issue. Did you find out what should we do here ?

Hey dude, I think for seq2seq problem, during training, the tgt without (end of sequence) should be the input; while for inference, the tgt should be the single token (start of sequence), with a for loop generating tokens until is generated.