Confusion about using .clone

nope, a more realistic application of clone is in seq2seq model, which related to more than one decode steps.

look at this code from huggingface bart seq2seq model:

there are two branch from input_ids, the first is itself , the second is the decoder_input_ids, which needs shift operation with inplace modification on input_ids. On the other hand, the forward function need keep gradient for all input_ids element, as well as input_ids element in decoder_input_ids.

so you should use clone in this occasion