Freeing CUDA memory after forwarding tensors

The whole computation graph is connected to features, which will also be freed, if you didn’t wrap the block in a torch.no_grad() guard.
However, the second iteration shouldn’t cause an OOM issue, since the graph will be freed after optimizer.step() is called.

If you run out of memory after the training and in the first evaluation iteration, you might keep unnecessary variables alive due to Python’s function scoping as explained in this post.

2 Likes