Considerable slowdown in Adam.step after a number of epochs

I have a model with multiple outputs and, therefore, multiple losses. When training I accumulate the losses using retain_graph. Something along the lines of:

self.zero_grad()
for output_label, output in self(input, target).items():
    loss = self.loss(output, target[output_label])
    loss.backward(retain_graph=True)
self.optimizer.step()

where input, output and target are dictionaries with the respective data for the different inputs and losses.

I am using Adam for the optimization.
I’ve noticed that after a number of epochs, the running time of an epoch goes suddenly up from 7sec to 34sec.
I also noticed a slowdown of CPU usage in my computer (I haven’t test this yet on the GPU). Memory usage doesn’t seem to increase.

I profiled the code and I saw this (output from cProfile):

Normal epoch:

       52    0.012    0.000    3.538    0.068 adam.py:30(step)
      624    0.646    0.001    0.646    0.001 {method 'addcdiv_' of 'torch._C.FloatTensorBase' objects}

Slow epoch:

       52    0.013    0.000   24.576    0.473 adam.py:30(step)
      624   21.469    0.034   21.469    0.034 {method 'addcdiv_' of 'torch._C.FloatTensorBase' objects}

I’ve tested with other adaptive losses like Adagrad, and there I can’t see the issue.
It seems to be related to this line of code in Adam.step():

p.data.addcdiv_(-step_size, exp_avg, denom)

Any ideas about why this is happening? It seems like suddenly the size of the accumulated gradient explodes, but I can’t see why.

1 Like