In my case, before the iteration I got several tensor which will be calculated through the whole training stage. As I noticed, the memory keeps increasing, even though I use .cpu(). The code structure is like allowings:
tensor_for_accumulate = [ ]
for iter in range(10):
…
tensor_current = …
tensor_for_accumulate += tensor_current
I found that the length and dtype is the same during the training. Thus, I can’t understand why the memory is increasing all the time.
If tensor_current
is attached to the computation graph (i.e. if its .grad_fn
attribute returns a valid function) then all computation graphs will be stored with tensor_current
in tensor_for_accumulate
. Pushing tensor_current
to the CPU doesn’t change anything if the operations were performed on the GPU beforehand, since all intermediate will still be on the GPU.
I’m not familiar with your use case and don’t know if you want to store the computation graph. If not, detach()
the tensor before accumulating it.
1 Like
Thanks a lot!!! detach() is work for and I check out the reason is exact what you mentioned above.