Is there a way to know whether a tensor will fit into the remaining GPU ram, before creating it?
I’ve been trying to figure out the same thing. Recent paper “DNNMem” apparently can analyze a model file to figure out its size BEFORE loading into memory, but they haven’t released any code. Instead, what I’ve been doing at least is adding a lot of exception handling to respond to CUDA OOMs.
I’ve tried to calculate free memory by total_memory - memory_allocated
or total_memory - memory_reserved
, I’ve also tried to do it with pynvml, here is the code snippet:
def remaining_memory():
pynvml.nvmlInit()
gpu_handle = pynvml.nvmlDeviceGetHandleByIndex(0)
info = pynvml.nvmlDeviceGetMemoryInfo(gpu_handle)
return info.free
and none of them are accurate.
I guess it has something to do with memory fragmentation, if the remaining space is not contiguous, you can’t fit a large tensor into it even if it shows there are enough free memory. I just want to know how pytorch checks whether a tensor will fit in memory.