Autograd adds gradients for each backward trajectory?

fixedrl · October 6, 2017, 2:55pm

Suppose we have a computational graph like this

P = NN policy
D = NN dynamics

S0 = Variable()
A1 = P(S0)
S1 = D(S0, A1)
A2 = P(S1)
S2 = D(S1, A2)

L = cost(S1) + cost(S2)

L.backward()

Update policy P.step()

Since each call of dynamics D will bifurcate the backward path into 2 trajectories, will backward function automatically adds their gradients ?

What I am doing now is make a clone of policy P1, P_clone and use P1 in first action selection, and P_clone for all the consecutive time steps.

smth · October 11, 2017, 5:35am

by default autograd accumulates all gradients.