In the following code, I notice that CUDA memory usage increases each time I create a new stream. Even after calling gc.collect()
and torch.cuda.empty_cache()
, the memory allocated on the GPU seems to multiply with each iteration.
Is there a way to release the memory allocated by each non-default stream?
Here’s the code for reference:
import gc
import torch
import torch.nn as nn
def run(model):
model = model.to('cuda')
input = torch.rand(10, 100)
target = torch.rand(10, 100)
input = input.to('cuda')
target = target.to('cuda')
model.train(True)
with torch.cuda.stream(stream):
output = model(input)
loss = torch.nn.functional.mse_loss(output, target)
loss.backward()
stream.synchronize()
print('allocated', torch.cuda.memory_allocated())
print('reserved', torch.cuda.memory_reserved())
model = nn.Sequential(nn.Linear(100, 100), nn.Linear(100, 100))
for i in range(4):
print(i)
stream = torch.cuda.Stream()
run(model)
model.train(False)
model.to('cpu')
gc.collect()
torch.cuda.empty_cache()
torch.cuda.synchronize()
And here is the memory allocation printout:
0
allocated 17219584
reserved 25165824
1
allocated 34258944
reserved 46137344
2
allocated 51298304
reserved 67108864
3
allocated 68337664
reserved 88080384
Why does the allocated memory keep increasing with each new stream, and how can I properly release it?