Too much GPU Memory Usage

I am trying to evalutate a pytorch based model. The evalutation is working fine but when I see the gpu memory usage during forward pass it is too high and does not freed unitl the script is finished. I know initially it should increase as the computation increases during forward pass but it should decrease when the computations are done but it remains same. If I evaluate on further iteration it becomes cumulative. Should I use torch.cuda.empty_cache() after each forward pass.

# parameters loaded to model
model.eval()
with torch.no_grad():
    for idx, batch in enumerate(test_dataloader):
        # memory-usage: 1903 MiB
        noisy_depth = batch.get("noisy_depth").unsqueeze(1).to(device)
        # memory-usage: 1903 MiB
        output = model(noisy_depth)
        # memory-usage: 5192 MiB 
        break

 # memory-usage: 5192 MiB 
print("Finished")

PyTorch uses a caching mechanism to reuse the device memory and thus avoid the (synchronizing) malloc/free calls. You can check the memory usage via print(torch.cuda.memory_summary()), which will report how much memory is allocated, in the cache etc.
Calling empty_cache() in each iteration will slow down the code (since you won’t be able to reuse memory) and will not lower the memory usage.

If you are seeing an increase in the allocated memory usage, you are most likely storing tensors in e.g. a list, which are still attached to the computation graph, so you would need to detach() them assuming you won’t want to backpropagate through them anymore.