Improving CUDA out of memory report

stas · January 23, 2019, 2:48am

This thread is split of from GPU RAM fragmentation diagnostics as it’s a different topic.

I’d like to ask whether it’s possible to make this message more clear:

RuntimeError: CUDA out of memory. 
Tried to allocate 350.00 MiB 
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated; 
324.56 MiB free; 1.34 GiB cached)

The cached part of this message is confusing, since torch.cuda’s memory_cached contains memory_allocated in its counter. Yet, in this report ‘cached’ == ‘cached but not allocated’ as confirmed by @colesbury.

Any chance this can be stated as such so that it matches torch.cuda.memory_cached or to change the wording? Perhaps to ‘cached free’? So it’d look like:

RuntimeError: CUDA out of memory. 
Tried to allocate 350.00 MiB 
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated; 
324.56 MiB free; 1.34 GiB cached free)

So now it easier to see that 7.93G = 5.73G + 324.56M + 1.34G.

Plus it doesn’t add up to the first number - it adds up to 7.38 GiB. Can pytorch somehow account for the remaining 0.54G used by cuda context?

I guess it could just deduce it from the total?

(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated; 
324.56 MiB free; 1.34 GiB cached free; 0.54G CUDA context)

It won’t be precise but at least the user can now see all the pieces of the memory.

But then it’d only be so if there is only one process, using the card, and there would be no way to make any such deductions if more than one process uses it. So this idea is not going to work.

Another approach would be to allocate 1byte on cuda and then measure the memory usage before and after - that would give the size of the context, at least upon CUDA setup.

But just clarifying the ‘cached’ part of the report would be great.

Thank you.

stas · January 23, 2019, 9:40pm

Actually, what would be great is to have another metric among those numbers, and it’s of how much pytorch was able to allocate. That would indicate to the user how far are they from their goal.

Since the current report:

RuntimeError: CUDA out of memory. 
Tried to allocate 350.00 MiB 
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated; 
324.56 MiB free; 1.34 GiB cached)

gives me absolutely no indication how far or close I am from reaching that goal of 350MB. Of what use it is to me to know that I have 5 times the needed RAM cached if I can’t use it? Or that there are 324MB which are likely to be fragmented and not usable.

For example I found this routine that calculates the actual allocatable memory (with some tweaks from me). As the SO answer explains it can be as low as 80% of the reported free memory.

 /* calculate how much memory can be really allocated (which is not the same as free)
   https://stackoverflow.com/a/8923966/9201239
*/

#include <stdio.h>
#include <cuda.h>
#include <unistd.h>

const size_t Mb = 1<<20; // Assuming a 1Mb page size here

int main() {

    size_t total;
    size_t avail;
    cudaError_t cuda_status = cudaMemGetInfo(&avail, &total);
    if ( cudaSuccess != cuda_status ) {
      printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status) );
      exit(EXIT_FAILURE);
    }

    printf("free: %.f, total %.f\n", (double)avail/Mb, (double)total/Mb);

    int *buf_d = 0;
    size_t nwords = total / sizeof(int);
    size_t words_per_Mb = Mb / sizeof(int);

    while (cudaMalloc((void**)&buf_d,  nwords * sizeof(int)) == cudaErrorMemoryAllocation) {
      cudaFree(buf_d);
      nwords -= words_per_Mb;
      if (nwords < words_per_Mb) {
        // signal no free memory
        break;
      }
    }
    cudaFree(buf_d);

    printf("can allocate:  %.fMB\n", (double)nwords/words_per_Mb);

    return 0;

}

Would it be possible to include the output of this one in the OOM error? So then the output would be:

Tried to allocate 350.00 MiB out of 305.00MB available.

or the whole thing again with the new info:

RuntimeError: CUDA out of memory. 
Tried to allocate 350.00 MiB out of 305.00MB available.
(GPU 0; 7.93 GiB total capacity; 5.73 GiB already allocated; 
324.56 MiB free; 1.34 GiB cached)

So now I know I’m short of 45MB, and can act on it more intelligently than without having that information.

In fact it then renders the rest of the numbers quite useless. Which are pretty useless anyway as I explained at the beginning of that answers.