Redefining model with fewer parameters → out of memory (OOM)?

Geremia · February 17, 2025, 11:50pm

Why when I redefine my model to have fewer parameters and retrain (after training once before) does PyTorch sometimes through an OOM error?

In other words: How do I tune model hyperparameters without running into OOM errors?

Do I have to reset the computational graph / saved gradients in the dataset’s tensors? If so, how?

Geremia · February 19, 2025, 10:58pm

When I make too big of a model and training throws an OOM error when passing an batch to it, I have to del model and run

gc.collect()
torch.cuda.empty_cache()