I have trained a model, and then I implement inference with it. After the first inference, the model takes a large amount of memory. Then even though I no longer feed inputs to it, it still takes up these memories. I tried torch.cuda.empty_cache, but it still cannot shrink the memory usage to the amount before the first inference. How could I unload the model and keep the process going when no input is fed, and reload it once new inputs are available?
The memory still cannot be collected completely.
my test code is like this:
import torch
import torchvision
net = torchvision.models.resnet101()
net.cuda()
net.eval()
_ = input()
inten = torch.randn((32, 3, 224, 224)).cuda()
for i in range(10):
out = net(inten)
print(out.shape)
_ = input()
net.cpu()
torch.cuda.empty_cache()
_ = input()
It takes around 2G before the first input, then the memory boosts to 11G before the second input. After executing net.cpu() and torch.cuda.empty_cache(), there are still around 6G memory used. How could I resume the 2G state ?
The torch.cuda.empty_cache() is not placed properly. When you call it, the net has been moved out but not out and inten. You need to call it after the function call as it is when exiting the function call that these will be cleaned out properly.