How to implement TD(λ)


(Tyler Walker) #1

i have a vanilla NN that looks like this

self.model = torch.nn.Sequential(
		 	torch.nn.Linear(self.INPUT_SIZE, self.HIDDEN_SIZE),
		 	torch.nn.Sigmoid(),
		 	torch.nn.Linear(self.HIDDEN_SIZE, self.OUTPUT_SIZE),
		 	torch.nn.Sigmoid())

i want to implement TD(λ) between steps. λ is a constant btw 0, 1 that sets lifespan of a gradient trace.
psuedo-code for this is

loss.backward()
model.gradients = model.gradients + λ * model.previous_gradients
optimizer.step()
model.previous_gradients = model.gradients

i think i might be able to accomplish this using gradient hooks, but i’m unsure how to do that. i can’t shake the feeling that it might be easier than that. SGD’s momentum is mathematically similar, but i can’t tell if it’s identical.


(Alexis David Jacq) #2

As far as I understand, using gradient hook should only allow you to modify the new gradient. But you want to modify the previous one, so I don’t see a simpler way than:

# multiply previous gradient by lambda:
for p in model.parameters():
    p.grad *= lambda
loss.backward() # add the new gradient
optimizer.step()

(Tyler Walker) #3

oh, that’s interesting! then never call optimizer.zero_grad()
thanks.