Out of memory error during evaluation but training works fine!

You might run out of memory if you still hold references to some tensors from your training iteration.
Since Python uses function scoping, these variables are still kept alive, which might result in your OOM issue. To avoid this, you could wrap your training and validation code in separate functions. Have a look at this post for more information.

2 Likes