Why backpropagating twice is painfully slow?

kirk86 · March 7, 2021, 3:11pm

Hi folks,
I have the following pseudocode:

for epochs
   for minibatch
       inputs.requires_grad_()
       logits = model(inputs)
       loss1 = loss_fn(logits, y)
       loss1.backward(retain_graph=True)  # backward once
       loss2 = loss_fn(x, y)
       loss = loss1 + loss2 
       loss.backward()  # backward twice
       opt.step()

Which runs extremely slowly, when watching my gpu volatility its almost always at 0%.
Am I doing sth wrong? Or, maybe you have any ideas to speed things up?

Thanks!

ZimoNitrome · March 7, 2021, 8:20pm

Is there a reason why you need to backprop loss1 first and again when loss have been calculated?

The common structure when using multiple objective functions is to simply just backprop both of them at once:

for epochs
   for minibatch
       inputs.requires_grad_()
       logits = model(inputs)
       loss1 = loss_fn(logits, y)
       # oss1.backward(retain_graph=True)    <-- Removed this line
       loss2 = loss_fn(x, y)
       loss = loss1 + loss2 
       loss.backward() 
       opt.step()

And balance the different losses with constants i.e.:

loss = 2*loss1 + 1*loss2

If loss1 is more important than loss2.

kirk86 · March 7, 2021, 10:36pm

Yes there’s a reason for that, the 1st backprop gives me access to inputs.grad which is used then to produce loss2.

Removing it will obliterate completely the purpose of loss2

ptrblck · March 8, 2021, 7:39am

How are you profiling these operations? If you are using the GPU, remember that you would need to synchronize the code to get valid profiles, as CUDA operations are executed asynchronously.
If you’ve already taken care of it, I guess the backward operation in loss2 adds the overhead.

kirk86 · March 8, 2021, 10:44am

Hi @ptrblck,
thanks for the remarks.
Just to be sure that I’m not doing anything stupid could you please provide a MWE for this