and then I put the model in the evaluation mode and run it:
model = SimpleNet(n_variables).cuda()
But I am getting a CUDA error: out of memory error. If I reduce the size of “factors” it works fine. I assume I am still keeping tracks of some gradients or something, even if the model is in evaluation mode (sorry I am pretty new to pytorch). Can someone help me with this please? Thank you!
model.eval() changes the behavior of some layers, e.g. nn.Dropout won’t drop anymore and nn.BatchNorm will use the running estimates instead of the batch stats.
To save some memory you should wrap your code in with torch.no_grad(): so that the intermediate activations won’t be stored:
No gradients are being calculated, but the intermediate activations are stored, which are necessary to calculate the gradients. If you wrap your code in this statement, the intermediates will be cleared, which also means you cannot call .backward() on your loss anymore.