How to clear some GPU memory?

I have a similar problem to @nikhilweee. I have tried to clean garbage with calls to both torch.cuda.empty_cache() and torch.cuda.ipc_collect(). This works great in my CNN training loop. Reserved memory stays constant for any number of batches. But in the validation loop, the memory use climbs after each iteration. I found that the difference is the loss.backward() statement in the training loop is cleaning out the garbage somehow. Since this isn’t in validation, it just keeps piling up in spite of the calls to torch.cuda.ipc_collect().

Anybody know what’s going on in the backward method?