In my case, before the iteration I got several tensor which will be calculated through the whole training stage. As I noticed, the memory keeps increasing, even though I use .cpu(). The code structure is like allowings:

tensor_for_accumulate = [ ]

for iter in range(10):

…

tensor_current = …

tensor_for_accumulate += tensor_current

I found that the length and dtype is the same during the training. Thus, I can’t understand why the memory is increasing all the time.

If `tensor_current`

is attached to the computation graph (i.e. if its `.grad_fn`

attribute returns a valid function) then all computation graphs will be stored with `tensor_current`

in `tensor_for_accumulate`

. Pushing `tensor_current`

to the CPU doesn’t change anything if the operations were performed on the GPU beforehand, since all intermediate will still be on the GPU.

I’m not familiar with your use case and don’t know if you want to store the computation graph. If not, `detach()`

the tensor before accumulating it.

1 Like

Thanks a lot!!! detach() is work for and I check out the reason is exact what you mentioned above.