How to get the size of the memory needed to train a model?

J_Johnson · October 23, 2023, 9:57am

As long you can fit a batch size of 1 on the gpu, you can use gradient accumulation.

As for calculating the GPU size, it’s a bit complicated with models using convolutions. This is because it also depends on your image size, the number and size of layers, the dtype, kernel size, optimizer, model.train() vs. model.eval(), and the batch size.

The chart on this page gives the parameter sizes between various pretrained vision models for Pytorch. That can generally be used as a relative comparison, but you’ll need to use some trial and error, depending on how you set the batch size, image size, etc.