Im trying to make a somewhat custom optimizing algorithm however Im having a real tough time figuring out how to optimize one layer at a time without running the whole loss.backward() for every layer. I saw someone had basically the same question here: Computing Backward one layer at a time but the answer wasn’t descriptive enough for me to really understand. I also found this gem: Bit of fun with gradients which gets me close however I don’t know how to grab the gradient of the detached tensor of the previous layer so that I can give that to the previous layers backward call.
Here are clips of the relevant (not)working code:
class testNet(nn.Module): def __init__(self, n_observations, n_actions): super(testNet, self).__init__() self.layer1 = nn.Linear(n_observations, 128) self.layer2 = nn.Linear(128, 128) self.layer3 = nn.Linear(128, n_actions) self.activate = nn.LeakyReLU(.01) def forward(self, x): x1 = self.activate(self.layer1(x)) x2 = self.activate(self.layer2(torch.tensor(x1.detach(), requires_grad=True))) return self.layer3(torch.tensor(x2.detach(), requires_grad=True)) params = list(test_net.parameters()) params.reverse() slope = nn.MSELoss() loss = slope(state_action_values, expected_state_action_values.unsqueeze(1)) loss.backward(retain_graph=True) # continue backward grad layer by layer for i in range(1,len(params),2): # optimize this layers parameters params[i+1].backward(gradient=HOW_CAN_I_GET_THIS, retain_graph=True)
It would be nice to solve the mystery of where values are stored in the computation graph but any implementation details for a layer by layer backprop would be appreciated.
Thank you for your time.