Every time I run my script, the script trains fine, but during validation, runs OOM after the same number of batches. Why would this be happening in the middle of the loop?
Check if the memory usage increases during the training or validation and if so make sure to avoid storing tensors to e.g. a list
which are still attached to a computation graph.
PS: it’s always better to post code snippets instead of screenshots.