Spliting my model on 4 GPU and cuda out of memory problem

If the OOM happens on the second iteration then it may be because the variables inside the loop are still alive during the second run. Python has function scoping (not block or loop scoping) so any variables declared during the first iteration remain alive in subsequent iterations until they are redefined.

Here is another post about this:

See my comment about how you can rewrite your program to avoid two versions alive at once.

3 Likes