Hey,
I have a workload that has to take a model back and forth to cuda and cpu:
for i in range(iterations):
with torch.no_grad():
model.to('cuda')
... operations ...
with torch.no_grad():
model.to('cpu')
This code leads to memory overflow on CPU depending on how many iterations there are.
Am I missing something? or is it a bug?