Background：I use DQN and DDPG to solve two tasks simultaneously. The `state(input)`

of DQN and DDPG are both two parts. One part is the states of the environment, and the other one is the states abstracted from the environment by CNN+LSTM. The two parts are concatenate in `forward_dqn()`

, `forward_actor()`

and `forward_critic()`

respectively.

Question1： I backward propagate the `loss_dqn`

, `loss_ddpg_actor`

, and `loss_ddpg_critic`

in sequence and get the error “Trying to backward through the graph a second time, but the buffers have already been freed.” in the backward propagation of `loss_ddpg_actor`

. Since after the backward propagation of loss_dqn, the computational graph has been freed, so I have forward propagated CNN+LSTM again to calculate the loss_ddpg_actor. Why the computational graph cannot be created again? Thanks.

Model: (env: environment)

```
output_cnnlstm = cnnlstm.forward(env)
DQN_output = dqn.forward(cat(output_cnnlstm, state_env))
Actor_output = actor.forward(cat(output_cnnlstm, state_env))
Critic_output = critic.forward(cat(output_cnnlstm, state_env))
```

Code 1 (Q1):

```
# dqn
# forward: cnnlstm
s_cnnlstm_out, _, _ = self.model.forward_cnnlstm(s_cnnlstm, flag_optim=True)
# forward: dqn
q_eval_dqn = self.model.forward_dqn_eval(s_dqn, s_cnnlstm_out).gather(1, a_dqn)
q_next_dqn = self.model.forward_dqn_target(s_dqn_next, s_cnnlstm_out).detach()
q_target_dqn = r + GAMMA_DQN * q_next_dqn.max(dim=1)[0].reshape(SIZE_BATCH * SIZE_TRANSACTION, 1)
# optimzie: dqn
loss_dqn = self.loss_dqn(q_eval_dqn, q_target_dqn)
self.optimizer_cnnlstm.zero_grad()
self.optimizer_dqn.zero_grad()
loss_dqn.backward()
self.optimizer_cnnlstm.step()
self.optimizer_dqn.step()
loss_dqn = loss_dqn.detach().numpy()
# ddpg
# actor
# forward: cnnlstm
s_cnnlstm_out, _, _ = self.model.forward_cnnlstm(s_cnnlstm, flag_optim=True)
# forward: ddpg: actor
a_eval_ddpg = self.model.forward_actor_eval(s_ddpg, s_cnnlstm_out)
# optimze: ddpg: cnnlstm + actor
loss_ddpg_actor = - self.model.forward_cirtic_eval(s_ddpg, a_eval_ddpg, s_cnnlstm_out).mean()
self.optimizer_cnnlstm.zero_grad()
self.optimizer_actor.zero_grad()
loss_ddpg_actor.backward()
self.optimizer_cnnlstm.step()
self.optimizer_actor.step()
loss_ddpg_actor = loss_ddpg_actor.detach().numpy()
```

Question2: I write a demo to test the propagation process and the demo seems to work well since the loss descends normally and the test error is low. So I want to ask the difference between the two codes and models.

Model:

```
output_model1 = model1.forward(x)
output_model21 = model21.forward(cat(output_model1, x1))
output_model22 = model221.forward(cat(output_model1, x2))
```

compared with the model of Q1, output_model1 ~ cnnlstm, output_model21 ~ DQN, output_model22 ~ Actor

Question3： I set breakpoint in the demo after `loss1.backward()`

and before `optimizer1.step()`

. However, on the one hand, the weight of the linear layer of Model21 changes with the optimization. On the other hand, `x._grad`

is a gradient value tensor, while `x1._grad`

is `None`

. So I wonder the parameters of Model21 are optimized whether or not and why `x1._grad`

is None.

Code 2 (Q2 and Q3):

```
for i in range(NUM_OPTIM):
# optimize task 1
y1_pred = self.model.forward_task1(x, x1)
loss1 = self.loss_21(y1_pred, y1)
self.optimizer1.zero_grad()
self.optimizer21.zero_grad()
loss1.backward()
self.optimizer1.step()
self.optimizer21.step(
# optimze task 2
y2_pred = self.model.forward_task2(x, x2)
loss2 = self.loss_22(y2_pred, y2)
self.optimizer1.zero_grad()
self.optimizer22.zero_grad()
loss2.backward()
self.optimizer1.step()
self.optimizer22.step()
```