How to estimate minimum GPU requirements for my application

Hi,
How do you estimate a minimum GPU requirement for your application ?
This is something I’ve never been able to clarify.
My application will use a model for inference, it’s a pytorch model for OCR (basically I’m using easyocr for this) and I’ll do a single image inference. I’m doing some test on Colab using the following functions after a single prediction:

torch.cuda.max_memory_allocated()
torch.cuda.max_memory_cached()

I always get around 2GB for both (1.9 for the first and 2.5 for the second one), the GPU is a Nvidia T4.
Is this the right way to do it ? Can I safely assume a GPU with 4GB memory is my minimum requirement? Will this still be valid on another GPU model ?

The total memory you’ll need will be a function of: serialized weights of model, activations in case grad = true, size of batch. So 4GB in your case sounds fine, but often with these things benchmark and see, run nvidia-smi in a watch loop and see if you run into an OOM after long running times.

thanks @marksaroufim, Using nvidia-smi in a loop in colab gives me a maximum value of 4910MiB for memory used during the execution, so I think 4GB GPU is not enough. Anyway… why after the exectution memory used sill report 4910MiB? Is that the maximum usage reported or it is due to the pytorch caching policy?