I’m having a hard time understanding how to use nn.Transformer, too, even after reading through this thread, the tutorial, this github issue, and the example language model. My model seems to do nothing but copy the target sequence, no matter what I do.
The task is to predict the title of an article, given a sentence from the article. It’s just a test task for a similar task I would like to do. The sentence and the title are both of varying length. To facilitate batching, I use data loader
collate_fn to pad every sentence in a batch to the length of the longest sentence in the batch. Same for title. While using nn.Transformer, I make the sentence the src, and the title the tgt.
I include a padding mask for both src and tgt, which has
True values wherever I padded a sentence. I also include a
tgt_mask generated by
generate_square_subsequent_mask to make it so that the decoder can’t look ahead in a sequence while it’s predicting. Since the model was still copying everything, I also included a square mask for the src, but that didn’t help anything.
I feel that I’m missing something very obvious. Can anybody help?
Looping in @zhangguanheng66 who seems to know a lot about this.