Hello, I am facing the potential GPU memory leakage problem, I made some simple test to measure the memory:
def train(data):
model.train()
for batch in train_loader:
print("before: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
batch = batch.to(device)
optimizer.zero_grad()
out = model(batch.x)
loss = F.nll_loss(out, batch.y)
print("middle: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
loss.backward()
optimizer.step()
print("after: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
return float(loss)
And in my model.forward, I also set a memory measurement just before return:
def forward(self, x):
```(omitted)
print("allocated: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
return out
Then, the output seems weird, at the first several batches:
before: 0.00 MB
allocated: 918.37 MB
middle: 900.69 MB
after: 46.32 MB
before: 46.32 MB
allocated: 1496.60 MB
middle: 1468.49 MB
after: 109.55 MB
before: 109.55 MB
allocated: 571.27 MB
middle: 562.04 MB
after: 129.10 MB
but after several batches, the value increases and is larger than it should be and becomes stable:
before: 7077.44 MB
allocated: 7951.10 MB
middle: 7933.39 MB
after: 7077.51 MB
before: 7077.51 MB
allocated: 8597.35 MB
middle: 8566.14 MB
after: 7077.21 MB
before: 7077.21 MB
allocated: 8363.79 MB
middle: 8337.63 MB
after: 7077.20 MB
Does memory leakage happen in my code? The gap between allocated and middle in my code is small, is this normal? Any help would be appreciated.