Gradient descent back propagating for intermediate loss function

zack.zcy · October 12, 2017, 4:02pm

Hi there, recently, I’m writing a customized loss function, I’m not sure about whether propagating gradient through the whole network or just part of it.
Specifically, I have an autoEncoder model, my loss function consists of 2 separate loss, where 1st part is taking the output of autoEncoder as input, whereas 2nd part is taking the intermediate output of the encoder network(without the decoder network) as its input. My question is that should I backprop gradient of 2nd loss through the decoder network?

class autoEncoder(nn.Module):
    def __init__(self):
        super(autoEncoder,self).__init__()
    def forward(self,x):
        h = encoder(x)
        x = decoder(h)
        return x

For simplicity, encoder and decoder are 2 classes implementing the encoder network and decoder network respectively. with that defined, my loss function is formalized as follows:

customized_loss  = loss_1(x) + loss_2(h) # loss_1 and loss_2 are torch.autograd.variable.Variable

Should I write gradient descent as follows ?

grad_params = torch.autograd.grad(loss_1,autoEncoder.parameters(),create_graph=True)
grad_params[-2].add_(torch.autograd.grad(loss_2,encoder.parameters(),create_graph=True))
torch.optim.SGD(grad_params).step()

or computes loss2 w.r.t he decoder network and update the whole network

customized_loss.backward()
torch.optim.SGD().step()

zack.zcy · October 13, 2017, 2:27pm

Will pytorch automatically build a graph that traces the loss2 variable?

smth · October 14, 2017, 8:34am

you can simply do customized_loss.backward() and gradients wrt both loss1 and loss2 are propagated back to the parameters.

zack.zcy · October 18, 2017, 8:36am

thanks dude, wonderful framework!

Shisho_Sama · September 5, 2019, 4:21am

Did you get your answer?