How does pytorch manage the cached gpu memory?

I have a simple nn model. Under the normal procedure, if I pass the input tensor T1 to the model, the gpu memory usage is about 500M.

Now, I need to calculate an additional loss. So I pass another input tensor T2 to the model. Importantly, the size of T2 will increase over time. I find the gpu memory usage will increase over time too. But at some point, the gpu memory usage will drop and then it will increase again. So I’m curious about the gpu memory management mechanism.

If I call torch.cuda.empty_cache() before each forward, will it influence the efficiency of my model?