In the context of learning with an RNN using a teacher forcing option, that is, to decide whether to condition the next timestep prediction on either the current ground-truth or model predicted state, should we
.detach() the predicted output in the case when we do not use teacher forcing?
The source of my confusion arises from an apparent discrepency in two PyTorch tutorials using a similar encoder-decoder architecture:
decoder_input = topi.squeeze().detach() # detach from history as input
decoder_input = torch.LongTensor([[topi[i] for i in range(batch_size)]])
Is this context-dependent or is there a canonical practice for handling the output of a model with teacher forcing optionality? Is there a difference between the two approaches in terms of performance?