How to output the loss gradient backpropagation path through a PyTorch computational graph

I have implemented a new loss function in PyTorch.

#model_1 needs to be trained
outputs = model_1(input)
loss = myloss(outputs,labels)

#output is how much to resize an image
#label give the image file index
#Below I am explaining myloss() function
org_file_name = pic_ + str(labels[0]) + ".png"
new_image = compress(org_image,outputs)
accuracy_loss = run_pretrained_yolov3(org_image, new_image)

#next two lines are modifying the same DAG
prev_loss = torch.mean((outputs-labels)**2)
new_loss = (accuracy_loss/prev_loss.item())*prev_loss
new_loss.backward()

Can anyone plz help me suggesting how can I know regarding how the loss gradient backpropagation through the computational graph?

[i.e., Actually, inside the myloss() function, I used some other pre-trained model applied in testing mode to get the difference or final loss value.] Now I want to know whether my new_loss.grad backpropagated through model1 or first through yolov3 then through model1? pretrained yolov3 is used on testing mode only.

I have tried tensorboard, it’s not providing me that option. Any suggestions will be highly helpful.

Hi,

The back-propagation will just happen in the reverse order of your forward function.
So it will go through these models in reverse order that you call them in the forward.

as the yolov3 is being used in inference mode with torch.no_grad() so that means no computational graph will be created for that. How it will compute the gradient for yolov3
as Loss = f(x) - f(k(g(x))) where f = yolov3 model, g = classification model I want to train, k= compression, x = original image and k(g(x)) = new compressed image
now loss gradient with respect to one parameter,
dLoss/dW = df/dk * dk/dg *dg/dW

how df/dk and dk/dg will be computed or represented in computational graph as those are not traditional differentiable operators.

Well if you run f in no_grad mode, then your loss won’t require gradients and you won’t be able to run backprop.
If you want to learn g properly, you will need to run the k and f applied after it in a differentiable manner to be able to get gradient.