Why my pytorch process keep expanding usage of GPU while running

Why my pytorch process keep expanding usage of GPU while running?

I just run a normal train file of my model, and today I suprisingly found that my process has been expanding the usage of GPU by the rate of 2 MB per training round.

Is that because I did not free some .cuda() Variable after use? I thought I did not have to free them by myself due to the autograd mechanism.

Or say, did I cause some memory leakage in GPU while running my nn.module-class model

COuld you please give your PyTorch version and error log ??

0.4.0

|    3     10040      C   python                                      1525MiB |

and no error log, just keep expanding,
I believe when its out of memory(mine is Nvidia P100 with 16G GPU), it would cause error log

it starts with 1100MiB usage and now is 1525 and keep expanding with 2M/ frame

What is the batch size in the training script??

just 2
specifically two frame per training cycle
a rather small model with small input size

which makes me even more surprise about this leakage

|    3     10040      C   python                                      1679MiB |

The autograd should clean the memory maybe you can refer to this issue

Are you storing the loss or some other variable, which is attached to the computation graph, somewhere?
Make sure to call detach on your model output etc., if you would like to store the predictions, loss etc. for further analysis.

Oh! Mr.@ptrblck , you remind me that In my nn.module model ---- net() , there is a class variable

self.previous_feature

it is used to store the extracted feature from the last training round.

BUT, it is like, updated every round, since it is always the current previous frame feature. Would it, say, just permanently stored in GPU? but it is in a updating mode~

OR! you remind me that : the previous feature is on the computation graph, so even it is updated, the older one is still on the graph and not detached. that is the reason the updated feature still exists as a graph node. right?

THX!!! great thx,
after a while, I use your method to modify some variable, and detach them , now the problem is solved , great thx!

1 Like

Good to hear you’ve get rod of the nasty memory leak! :wink: