What does the term "peak memory" should be referring to, max allocated vs. reserved?

YimingDong · March 16, 2024, 9:33am

I know that PyTorch uses a caching memory allocator to save the expensive cuda memory allocation calls, which in result produces two peak values, max_memory_allocated and max_memory_reserved, respectively. The allocated memory is the memory actually used by tensors, while the reserved memory also contains the cached ones made by pytorch memory allocator.
My problem is that, when I want to know the peak memory stats, which number should I use? Moreover, which value determines the OOM cases? For instance, My deep learning task takes up to 9GB allocated memory and 11GB reserved on a 20GB GPU. Will I encounter OOM if I run this task on a 10GB GPU?