How to output the loss gradient backpropagation path through a PyTorch computational graph

spaul13 · March 29, 2020, 5:29am

I have implemented a new loss function in PyTorch.

#model_1 needs to be trained
outputs = model_1(input)
loss = myloss(outputs,labels)

#output is how much to resize an image
#label give the image file index
#Below I am explaining myloss() function
org_file_name = pic_ + str(labels[0]) + ".png"
new_image = compress(org_image,outputs)
accuracy_loss = run_pretrained_yolov3(org_image, new_image)

#next two lines are modifying the same DAG
prev_loss = torch.mean((outputs-labels)**2)
new_loss = (accuracy_loss/prev_loss.item())*prev_loss
new_loss.backward()

Can anyone plz help me suggesting how can I know regarding how the loss gradient backpropagation through the computational graph?

[i.e., Actually, inside the myloss() function, I used some other pre-trained model applied in testing mode to get the difference or final loss value.] Now I want to know whether my new_loss.grad backpropagated through model1 or first through yolov3 then through model1? pretrained yolov3 is used on testing mode only.

I have tried tensorboard, it’s not providing me that option. Any suggestions will be highly helpful.

albanD · March 30, 2020, 2:03pm

Hi,

The back-propagation will just happen in the reverse order of your forward function.
So it will go through these models in reverse order that you call them in the forward.

spaul13 · March 30, 2020, 8:12pm

as the yolov3 is being used in inference mode with torch.no_grad() so that means no computational graph will be created for that. How it will compute the gradient for yolov3
as Loss = f(x) - f(k(g(x))) where f = yolov3 model, g = classification model I want to train, k= compression, x = original image and k(g(x)) = new compressed image
now loss gradient with respect to one parameter,
dLoss/dW = df/dk * dk/dg *dg/dW

how df/dk and dk/dg will be computed or represented in computational graph as those are not traditional differentiable operators.

albanD · March 30, 2020, 8:32pm

Well if you run f in no_grad mode, then your loss won’t require gradients and you won’t be able to run backprop.
If you want to learn g properly, you will need to run the k and f applied after it in a differentiable manner to be able to get gradient.