How to implement TD(λ)

i have a vanilla NN that looks like this

self.model = torch.nn.Sequential(
		 	torch.nn.Linear(self.INPUT_SIZE, self.HIDDEN_SIZE),
		 	torch.nn.Linear(self.HIDDEN_SIZE, self.OUTPUT_SIZE),

i want to implement TD(λ) between steps. λ is a constant btw 0, 1 that sets lifespan of a gradient trace.
psuedo-code for this is

model.gradients = model.gradients + λ * model.previous_gradients
model.previous_gradients = model.gradients

i think i might be able to accomplish this using gradient hooks, but i’m unsure how to do that. i can’t shake the feeling that it might be easier than that. SGD’s momentum is mathematically similar, but i can’t tell if it’s identical.

As far as I understand, using gradient hook should only allow you to modify the new gradient. But you want to modify the previous one, so I don’t see a simpler way than:

# multiply previous gradient by lambda:
for p in model.parameters():
    p.grad *= lambda
loss.backward() # add the new gradient

oh, that’s interesting! then never call optimizer.zero_grad()

1 Like

I am pretty confused on how to implement TD(lambda), can you tell me what is your loss function, i thougth that there were no losses for which you get the TD Lambda gradient ?