Hi there
we use the train and validation loss to determine the model performance in terms of whether it is overfitting ,underfitting ,or well generalization
so do we use the teacher forcing in validation also with no parameter update since it is validation
as if we use the autoregressive the validation loss start increasing from start of first epoch
can you kindly explain this
also if we use the teacher forcing in validation there is still the question that why we should use it since the validation should mimic the test scenario?
I don’t think using teacher forcing during the validation run is valid since you are interested in computing a proxy of the test accuracy which won’t use teacher forcing.
if we switches to autoregressive technique then how can we compute the loss
means do we force the model to give same token probabilities equal to the respective ground truth length
to compute the validation loss?