GPU memory increase while I wanna to backward() with accumulative loss

I know if I run the code below, the memory would increase

loss = 0
for i in range(n)
    loss_part = criteria()
    loss += loss_part

But why does it increase fast and never release the memory it allocated ?
The case is, I have different dimension of data in a batch ( sequence in different length ).
So I wanna to calculate every criteria( seq_i ) and then sum them.

One approach is I concatenate every seq_i, and only use criteria( seq_sum, target) once. Unfortunately it cannot fixed the memory increasing problem. Thanks to all previously.

Bs, I also used list to store some data (Tensor.cuda()).

New It seems the increase is not caused by accumulative loss. Need help to close this.