Hello. I have tried a few ways of estimating how much VRAM will be used by inference. Specifically different combinations of using the number of params of the model and number of elements of the image, but nothing really seems to work quite right.
I am wondering if there’s a known accurate way to perform this estimation.
For context, I am trying to prevent the users of my application from getting an out-of-memory error when using a model that is too large for their machine. I currently do this by essentially just try/catching the cuda out-of-memory error and tiling/chunking the image recursively, so it keeps tiling until it is able to infer (this is for super-resolution where the entire information of the image does not matter, so tiling is acceptable). However, for some users, this approach has caused issues, mainly where PyTorch sits right on the edge of usable VRAM and does not cause an out-of-memory error, but is instead just extremely slow. Plus, even though PyTorch causes an out-of-memory error fairly quickly, it still does take extra time to determine how much it needs to tile the image by just catching the errors.
So, I want to switch to an approach where I can just tell beforehand how much VRAM will be used by the model and image combination, and both show the user how much VRAM will be required as well as determine the correct amount for tiling beforehand.
If there is any way to do this accurately, please let me know. This is the formula I’m currently playing around with:
img_bytes = img_tensor.numel() * img_tensor.element_size() model_bytes = sum( p.numel() * p.element_size() for p in model.parameters() ) mem_required_estimation = (model_bytes / (1024 * 52)) * img_bytes
However, it’s a bit arbitrary. The 52 in that formula was chosen just by seeing what number got me closest to accurately determining what would get me out of memory. Surely there must be a better and more accurate way of doing this?
Thank you in advance for your time.