Ok, so I have a model split-up into multiple files. I am running the show from a jupyter notebook. The first time I ran everything, it was converging just fine, but started to overfit. I killed the kernel and told it to train for only 3 epochs and started everything again. I DID NOT TOUCH THE CODE! Now it’s not converging.
Has this ever happened to anyone?
Edit: Nevermind…I accidentaly set requires_gradients = False somewhere.