for epochs
for minibatch
inputs.requires_grad_()
logits = model(inputs)
loss1 = loss_fn(logits, y)
loss1.backward(retain_graph=True) # backward once
loss2 = loss_fn(x, y)
loss = loss1 + loss2
loss.backward() # backward twice
opt.step()
Which runs extremely slowly, when watching my gpu volatility its almost always at 0%.
Am I doing sth wrong? Or, maybe you have any ideas to speed things up?
How are you profiling these operations? If you are using the GPU, remember that you would need to synchronize the code to get valid profiles, as CUDA operations are executed asynchronously.
If you’ve already taken care of it, I guess the backward operation in loss2 adds the overhead.