Hi, I’m trying to calculate a specific loss, which needs to do the broadcasting operation, however I found it takes too much GPU memory, can anyone help me with it? I’ll show the example code below:
x = torch.randn(32, 7862, 1, 3).cuda()
y = torch.randn(32, 1, 7862, 3).cuda()
loss = torch.linalg.norm(x - y, dim = -1)
print(‘before sub’)
print('mem expected in MB: ', (x.nelement() + y.nelement()) * 4 / 1024 ** 2)
print('mem allocated in MB: ', torch.cuda.memory_allocated() / 1024 ** 2)
print('max mem allocated in MB: ', torch.cuda.max_memory_allocated() / 1024 ** 2)
print(‘after sub’)
print('mem expected in MB: ', (x.nelement() + y.nelement() + loss.nelement()) * 4 / 1024 ** 2)
print('mem allocated in MB: ', torch.cuda.memory_allocated() / 1024 ** 2)
print('max mem allocated in MB: ', torch.cuda.max_memory_allocated() / 1024 ** 2)
Many thanks!