I imagined that that the difference between allocated and reserved memory is the following:
- allocated memory is the amount memory that is actually used by PyTorch.
- reserved is the allocated memory plus pre-cached memory.
If that is correct the following should hold:
~=allocated memory after calling
Is my understanding correct?
My setup is as follows:
for param in params: torch.cuda.empty_cache() torch.cuda.reset_peak_memory_stats() try: test_function(param) except RuntimeError: break finally: print(torch.cuda.memory_summary())
Here is the compressed output:
||failing||Allocated memory (Peak Usage)||GPU reserved memory (Peak Usage)|
||15006 MB||16726 MB|
||17402 MB||19354 MB|
||19961 MB||22184 MB|
||20609 MB||22454 MB|
(Note that for
param==4 the memory report was generated after the error was raised and thus does not reflect the actual memory usage for the whole of
The memory requirement is growing approx. quadratic. A quick extrapolation for the failing
param==4) gives 22683 MB of allocated memory and 25210 MB of reserved memory. I have 24189 MB available and no other processes are running. Thus, if my understanding about allocated and reserved memory is correct, the case should not fail.
Can someone explain why this is not the case?