Hi all,
Im trying to make a somewhat custom optimizing algorithm however Im having a real tough time figuring out how to optimize one layer at a time without running the whole loss.backward() for every layer. I saw someone had basically the same question here: Computing Backward one layer at a time but the answer wasn’t descriptive enough for me to really understand. I also found this gem: Bit of fun with gradients which gets me close however I don’t know how to grab the gradient of the detached tensor of the previous layer so that I can give that to the previous layers backward call.
Here are clips of the relevant (not)working code:
class testNet(nn.Module):
def __init__(self, n_observations, n_actions):
super(testNet, self).__init__()
self.layer1 = nn.Linear(n_observations, 128)
self.layer2 = nn.Linear(128, 128)
self.layer3 = nn.Linear(128, n_actions)
self.activate = nn.LeakyReLU(.01)
def forward(self, x):
x1 = self.activate(self.layer1(x))
x2 = self.activate(self.layer2(torch.tensor(x1.detach(), requires_grad=True)))
return self.layer3(torch.tensor(x2.detach(), requires_grad=True))
params = list(test_net.parameters())
params.reverse()
slope = nn.MSELoss()
loss = slope(state_action_values, expected_state_action_values.unsqueeze(1))
loss.backward(retain_graph=True)
# continue backward grad layer by layer
for i in range(1,len(params),2):
# optimize this layers parameters
params[i+1].backward(gradient=HOW_CAN_I_GET_THIS, retain_graph=True)
It would be nice to solve the mystery of where values are stored in the computation graph but any implementation details for a layer by layer backprop would be appreciated.
Thank you for your time.