i have a vanilla NN that looks like this
self.model = torch.nn.Sequential(
torch.nn.Linear(self.INPUT_SIZE, self.HIDDEN_SIZE),
torch.nn.Sigmoid(),
torch.nn.Linear(self.HIDDEN_SIZE, self.OUTPUT_SIZE),
torch.nn.Sigmoid())
i want to implement TD(λ) between steps. λ is a constant btw 0, 1 that sets lifespan of a gradient trace.
psuedo-code for this is
loss.backward()
model.gradients = model.gradients + λ * model.previous_gradients
optimizer.step()
model.previous_gradients = model.gradients
i think i might be able to accomplish this using gradient hooks, but i’m unsure how to do that. i can’t shake the feeling that it might be easier than that. SGD’s momentum is mathematically similar, but i can’t tell if it’s identical.