I have a model that I’ve been slowly hyperparameter tuning with a large dataset (thousands of subjects). And seems to work just fine learning to generalize on unseen data.
But, for fun, today I tried training on just 1 or 10 subjects and setting the validation set to be the same as the training set (same 1 or 10 subjects)—to see if it could memorize the data. Obviously, the loss should be the same. However, that’s not what I see. The training and validation loss are identical before training. Then, for the first few epochs they both decrease, but diverge. Then, around 10 epochs in, the validation loss will reverse and quickly shoot up to its maximum value. And sometime around the first or second time the learning rate is halved (~100 epochs in), the validation loss will begin to decrease again, eventually matching the training loss after a few hundred more epochs.
I can’t for the life of me figure out why this is occurring. I use the same loop for train/validation (with the optimizer.zero_grad(), loss.backward(), and optimizer.step() behind a conditional).
Any pointers?