I am trying to perform optimization on the backward computational graph. Put another way, I want to treat the backward computational graph as a forward path. This way, I would have two .grad for each parameter, one calculated by the regular forwad loss and the other for a backward loss.
Here is a pseudo-code:
output1 = model(input) loss1 = criterion(output1, targets1) loss1.backward() output2 = backward_comp_graph(output1.detach()) loss2 = criterion(output2, targets2) loss2.backward()
I would appreciate your suggestions on what implementation strategy is the nicest!