What are the expected model sizes on the GPU of #vision resnets?

I’m currently chasing an OOM-Error thrown in the forward pass on my GPU (Tesla V100-SXM2-32GB). Loading the resnext50 from torchvision onto the cpu uses 27GB already. I wonder if that is as expected and how much additional space is needed in the forward pass. Is there any overview on model sizes and needed memory?

The memory usage depends not only on the stored parameters and activations but also on the used libraries. E.g. if you are using cudnn with its benchmark mode (via torch.backends.cudnn.benchmark = True), the first iteration will profile different algorithms and will select the fastest one, which might use more memory.
The available kernels depend on the cudnn version as well as the used GPU, so I think the best way to profile the model would be to check the max. batch size for your current setup.

That being said, if you disable e.g. cudnn and use the native implementations, you might be able to calculate the approx. memory usage for the current PyTorch version.