I have been working on using a ResNet-50 and have images of shape (3, 256, 256) and I’m trying to run it in batch size of 96, but I keep getting an error stating that it ran out of CUDA memory. I have a 3090 so I find that hard to believe since it has 24GB. So I then tested a very simple structure with this code, which showed that it takes 10.13GB to run with a shape (96, 3, 256, 256). Is this correct, I thought it would take a lot less allocated memory to run such a network? I checked and CUDA is running on my device and is available.
import torch
import torchvision.models as models
device = torch.device('cuda:0'if torch.cuda.is_available() else 'cpu')
def print_allocated_memory():
...: print("{:.2f} GB".format(torch.cuda.memory_allocated() / 1024 ** 3))
resnet50_input = torch.ones(96, 3, 256, 256).float().to(device)
resnet50 = models.resnet50(pretrained = False).to(device)
encoded = resnet50(resnet50_input)
print_allocated_memory()
10.13 GB