I’m currently trying to detect in advance if a particular model + gpu + PyTorch version combination will run into an out-of-memory issue without actually running the model on the device.
torch.cuda.max_memory_allocated() is working well to verify if I can correctly predict the minimum memory requirements of the model tensors on the GPU.
However I was wondering if there was any way to deterministically predict the overhead imposed by PyTorch + CUDA?
I don’t think there is a reliable way to calculate the real GPU memory usage without running the code.
Depending on your hardware, CUDA version, PyTorch version etc. the CUDA context would need a different memory footprint. While the difference might be small, this would already add some version dependency. Also, if you are using cudnn (and maybe other libs), the memory usage could be non-deterministic for different cudnn versions (e.g. due to updated heuristics).
Especially if you are using
torch.backends.cudnn.benchmark =True to profile different algorithms and select the fastest one.
For each new input shape, this benchmarking mode will be executed and might discard algorithms, which would need too much memory and would this not fit in the available workspace.
However, if you precompute the CUDA context and use the deterministic cudnn mode, there would still be memory fragmentation, so that the theoretically calculated memory requirement might still be lower than the actual one.
Thanks a lot for confirming this. The data did seem to suggest some amount of non-deterministic behaviour.