Why does PyTorch over fit after the first experiment

I can’t help but notice that PyTorch models are most likely to over fit after the first experiment. For example, when using k-fold cross validation, the first fold learns and tests normally (loss decreases slowly until it gets saturated) but the second, third, … folds already start with a low loss value, get to the best performance in the very first epochs, and then the model starts to over fit. Is there any explanation for this behaviour?


I guess you might not be resetting the experiment setup properly, i.e. the model, optimizer, lr_scheduler or any other object which “learns” during the first fold.