I am trying to evalutate a pytorch based model. The evalutation is working fine but when I see the gpu memory usage during forward pass it is too high and does not freed unitl the script is finished. I know initially it should increase as the computation increases during forward pass but it should decrease when the computations are done but it remains same. If I evaluate on further iteration it becomes cumulative. Should I use torch.cuda.empty_cache() after each forward pass.
# parameters loaded to model
model.eval()
with torch.no_grad():
for idx, batch in enumerate(test_dataloader):
# memory-usage: 1903 MiB
noisy_depth = batch.get("noisy_depth").unsqueeze(1).to(device)
# memory-usage: 1903 MiB
output = model(noisy_depth)
# memory-usage: 5192 MiB
break
# memory-usage: 5192 MiB
print("Finished")