How can I run vgg16 (transfer learning) on my GPU which has 4GB of memory?

I want to use VGG16 (transfer learning), but I don’t have enough memory:

  • According to nvidia-smi I have 4GB of memory
  • Model:
  model = torchvision.models.vgg16(pretrained=True)          
  for p in model.parameters():
      p.requires_grad = False

  sin                 = model.classifier[0].in_features    
  model.classifier = nn.Sequential(
      nn.Linear(sin, 128), nn.ReLU(),
      nn.Linear(128, 2)
  )

According to: torchinfo.summary(model, (64, 3, 224, 224))

Estimated Total Size (MB): 7046.64

And I’m trying to train it with:

DEVICE    = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
model     = model.to(DEVICE)
adam      = optim.AdamW  (model.parameters(), lr=1e-4, betas=(0.9, 0.99), weight_decay=2e-4)
loss      = nn.CrossEntropyLoss()

model.train(True) 
for ii, (x, y) in enumerate(trainDL):
    x = x.to(DEVICE)
    y = y.to(DEVICE)
    
    z   = model(x) 
    ...

Where trainDL is torch.utils.data.DataLoader

  • versions:torch==1.10.0torchinfo==1.5.3torchvision==0.11.1

I’m getting CUDA out of memory (which seems true because I have less GPU memory than the model size)

I read this old post (2 years ago…):

and maybee there is new solution to that issue ?

  1. How can I use vgg16 (transfer learning) and run it on my GPU ?
  2. What do I need to change in my code above to make it runs on my GPU ?

If the model parameters already take more memory than your GPU has I don’t think there is a way to make it work besides using only a subset of the model on the GPU and the rest on the CPU.
I don’t know if you would expect any speedup due to the transfer of tensors between the GPU and the host, so you would have to profile it against a CPU-only run.