Why my pytorch process keep expanding usage of GPU while running

Zichun_Zhang · December 17, 2018, 12:53pm

Why my pytorch process keep expanding usage of GPU while running?

I just run a normal train file of my model, and today I suprisingly found that my process has been expanding the usage of GPU by the rate of 2 MB per training round.

Is that because I did not free some .cuda() Variable after use? I thought I did not have to free them by myself due to the autograd mechanism.

Or say, did I cause some memory leakage in GPU while running my nn.module-class model

Amrit_Das · December 17, 2018, 12:57pm

COuld you please give your PyTorch version and error log ??

Zichun_Zhang · December 17, 2018, 12:59pm

0.4.0

|    3     10040      C   python                                      1525MiB |

and no error log, just keep expanding,
I believe when its out of memory(mine is Nvidia P100 with 16G GPU), it would cause error log

it starts with 1100MiB usage and now is 1525 and keep expanding with 2M/ frame

Amrit_Das · December 17, 2018, 1:00pm

What is the batch size in the training script??

Zichun_Zhang · December 17, 2018, 1:01pm

just 2
specifically two frame per training cycle
a rather small model with small input size

which makes me even more surprise about this leakage

|    3     10040      C   python                                      1679MiB |

Amrit_Das · December 17, 2018, 1:02pm

The autograd should clean the memory maybe you can refer to this issue

ptrblck · December 17, 2018, 4:05pm

Are you storing the loss or some other variable, which is attached to the computation graph, somewhere?
Make sure to call detach on your model output etc., if you would like to store the predictions, loss etc. for further analysis.

Zichun_Zhang · December 18, 2018, 2:46am

Oh! Mr.@ptrblck , you remind me that In my nn.module model ---- net() , there is a class variable

self.previous_feature

it is used to store the extracted feature from the last training round.

BUT, it is like, updated every round, since it is always the current previous frame feature. Would it, say, just permanently stored in GPU? but it is in a updating mode~

OR! you remind me that : the previous feature is on the computation graph, so even it is updated, the older one is still on the graph and not detached. that is the reason the updated feature still exists as a graph node. right?

Zichun_Zhang · December 18, 2018, 3:44am

THX!!! great thx,
after a while, I use your method to modify some variable, and detach them , now the problem is solved , great thx!

ptrblck · December 18, 2018, 11:23am

Good to hear you’ve get rod of the nasty memory leak!