This thread is split of from GPU RAM fragmentation diagnostics as it’s a different topic.
I’d like to ask whether it’s possible to make this message more clear:
RuntimeError: CUDA out of memory.
Tried to allocate 350.00 MiB
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated;
324.56 MiB free; 1.34 GiB cached)
The cached part of this message is confusing, since torch.cuda’s memory_cached
contains memory_allocated
in its counter. Yet, in this report ‘cached’ == ‘cached but not allocated’ as confirmed by @colesbury.
Any chance this can be stated as such so that it matches torch.cuda.memory_cached
or to change the wording? Perhaps to ‘cached free’? So it’d look like:
RuntimeError: CUDA out of memory.
Tried to allocate 350.00 MiB
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated;
324.56 MiB free; 1.34 GiB cached free)
So now it easier to see that 7.93G = 5.73G + 324.56M + 1.34G.
Plus it doesn’t add up to the first number - it adds up to 7.38 GiB. Can pytorch somehow account for the remaining 0.54G used by cuda context?
I guess it could just deduce it from the total?
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated;
324.56 MiB free; 1.34 GiB cached free; 0.54G CUDA context)
It won’t be precise but at least the user can now see all the pieces of the memory.
But then it’d only be so if there is only one process, using the card, and there would be no way to make any such deductions if more than one process uses it. So this idea is not going to work.
Another approach would be to allocate 1byte on cuda and then measure the memory usage before and after - that would give the size of the context, at least upon CUDA setup.
But just clarifying the ‘cached’ part of the report would be great.
Thank you.