Gradient descent back propagating for intermediate loss function


#1

Hi there, recently, I’m writing a customized loss function, I’m not sure about whether propagating gradient through the whole network or just part of it.
Specifically, I have an autoEncoder model, my loss function consists of 2 separate loss, where 1st part is taking the output of autoEncoder as input, whereas 2nd part is taking the intermediate output of the encoder network(without the decoder network) as its input. My question is that should I backprop gradient of 2nd loss through the decoder network?

class autoEncoder(nn.Module):
    def __init__(self):
        super(autoEncoder,self).__init__()
    def forward(self,x):
        h = encoder(x)
        x = decoder(h)
        return x

For simplicity, encoder and decoder are 2 classes implementing the encoder network and decoder network respectively. with that defined, my loss function is formalized as follows:

customized_loss  = loss_1(x) + loss_2(h) # loss_1 and loss_2 are torch.autograd.variable.Variable 

Should I write gradient descent as follows ?

grad_params = torch.autograd.grad(loss_1,autoEncoder.parameters(),create_graph=True)
grad_params[-2].add_(torch.autograd.grad(loss_2,encoder.parameters(),create_graph=True))
torch.optim.SGD(grad_params).step()

or computes loss2 w.r.t he decoder network and update the whole network

customized_loss.backward()
torch.optim.SGD().step()

#2

Will pytorch automatically build a graph that traces the loss2 variable?


#3

you can simply do customized_loss.backward() and gradients wrt both loss1 and loss2 are propagated back to the parameters.


#4

thanks dude, wonderful framework!