Transformer decoder outputs

As I mentioned in this post, a special token is usually added at the beginning of the (target) outputs during the training process, and will be used during inference to initialize the output generation (decoding) process, until another special token is produced (marking the end of this decoding process), which has also been added at the end of the target sequences during training.

1 Like