Hi everyone, this is my first post here so don’t be too mean please.
I am struggling with a weird problem that consumes the double cuda memory of what I expected.
Here is a quick snippet to replicate the problem:
from torchvision.models import resnet50 import torch # This loss consumes a lot of memory class ZeroLoss(torch.nn.Module): def forward(self, embeddings): return torch.tensor([0.], requires_grad=True) # This doesn't class MeanLoss(torch.nn.Module): def forward(self, embeddings): return embeddings.mean() * 0.0 use_zeroloss = True device = "cuda:0" if torch.cuda.is_available() else "cpu" model = resnet50().to(device) x = torch.randn(128, 3, 224, 224).to(device) optimizer = torch.optim.SGD(model.parameters(), lr=1e-5, momentum=0) # MeanLoss use 15GB of Vram ZeroLoss use 24GB of Vram criterion = ZeroLoss() if use_zeroloss else MeanLoss() while True: embeddings = model(x) loss = criterion(embeddings) loss.backward() optimizer.step() optimizer.zero_grad()
This is the minimum amount of code to replicate my problem.
Cuda memory used in above script with
use_zeroloss = False (up) and
use_zeroloss = True (down)
I expected ZeroLoss to save more memory than MeanLoss, but it’s actually the opposite and the gap is huge. Why?!?!?