GPU memory increases as output sequence increases

I read this post. My issue is similar to the second problem in it.

I actually feedback the output as part of the input to feed into the model for tens of times, and then calculate the loss. If the model graph and its variables are generated in every loop, that will take much memory. I’m not sure if this will lengthen the backward path and make the backprop harder.

In tensorflow, I can use while_loop to let a model to run for several times and then calculate the loss.