Hi there, recently, I’m writing a customized loss function, I’m not sure about whether propagating gradient through the whole network or just part of it.
Specifically, I have an autoEncoder model, my loss function consists of 2 separate loss, where 1st part is taking the output of autoEncoder as input, whereas 2nd part is taking the intermediate output of the encoder network(without the decoder network) as its input. My question is that should I backprop gradient of 2nd loss through the decoder network?
class autoEncoder(nn.Module):
def __init__(self):
super(autoEncoder,self).__init__()
def forward(self,x):
h = encoder(x)
x = decoder(h)
return x
For simplicity, encoder and decoder are 2 classes implementing the encoder network and decoder network respectively. with that defined, my loss function is formalized as follows:
customized_loss = loss_1(x) + loss_2(h) # loss_1 and loss_2 are torch.autograd.variable.Variable
Should I write gradient descent as follows ?
grad_params = torch.autograd.grad(loss_1,autoEncoder.parameters(),create_graph=True)
grad_params[-2].add_(torch.autograd.grad(loss_2,encoder.parameters(),create_graph=True))
torch.optim.SGD(grad_params).step()
or computes loss2 w.r.t he decoder network and update the whole network
customized_loss.backward()
torch.optim.SGD().step()