Hi,
I am training an RNN, and computing a loss for each step of the output sequence. Is calling loss.backward(retain_graph=True) for each step of the output more memory efficient/faster than summing loss = loss_step1 + … + loss_stepn and then calling loss.backward()? I have tried both, and they seem to give the same accuracy. But I’m not sure what is different behind the scenes.