Gradient descent back propagating for intermediate loss function

Hi there, recently, I’m writing a customized loss function, I’m not sure about whether propagating gradient through the whole network or just part of it.
Specifically, I have an autoEncoder model, my loss function consists of 2 separate loss, where 1st part is taking the output of autoEncoder as input, whereas 2nd part is taking the intermediate output of the encoder network(without the decoder network) as its input. My question is that should I backprop gradient of 2nd loss through the decoder network?

class autoEncoder(nn.Module):
    def __init__(self):
        super(autoEncoder,self).__init__()
    def forward(self,x):
        h = encoder(x)
        x = decoder(h)
        return x

For simplicity, encoder and decoder are 2 classes implementing the encoder network and decoder network respectively. with that defined, my loss function is formalized as follows:

customized_loss  = loss_1(x) + loss_2(h) # loss_1 and loss_2 are torch.autograd.variable.Variable 

Should I write gradient descent as follows ?

grad_params = torch.autograd.grad(loss_1,autoEncoder.parameters(),create_graph=True)
grad_params[-2].add_(torch.autograd.grad(loss_2,encoder.parameters(),create_graph=True))
torch.optim.SGD(grad_params).step()

or computes loss2 w.r.t he decoder network and update the whole network

customized_loss.backward()
torch.optim.SGD().step()
1 Like

Will pytorch automatically build a graph that traces the loss2 variable?

you can simply do customized_loss.backward() and gradients wrt both loss1 and loss2 are propagated back to the parameters.

thanks dude, wonderful framework!

Dear @smth,

What if I want to train the AE with respect to the AE loss, and and the second model with respect to the intermediate loss? Because what you suggested seems to me that the gradient of the whole model is going to be calculated w.r.t one loss, which is loss1+loss2

1 Like

Did you get your answer?