There is some strange behavior that I cannot understand. My setup: Ubuntu 16.04.5, pytorch 1.1.0, Tesla V100 SXM2.
I have a simple C++ module
empty_module that does memory allocation on GPU and can be called from python (see example code here). There is a boolean flag that controls, whether the allocated memory should be freed (which is the correct behavior).
The default execution time on my machine is roughly 1.5ms. When I set the flag to 0 (the memory is not freed up), the execution time falls by roughly 3 orders of magnitude to 12µs, and the memory leaks, which can be viewed in
nvidia-smi. However, if after that, without relaunching the python interpreter, I run the same function again with the flag set to 1 (memory is freed up, memory does not leak), the execution time stays at the same low levels as if the momory is not freed.
Perhaps this is the question for Nvidia developers, but I’m new to this and cannot devise an example without pytorch.
How does this happen?