I am confused about computation for target value in Q learning and actor critic method. So during the backpropagation, does the value network being involved for two times since we call it two times for calculating current value and next state value. So should I put with torch.no_grad() before calculate the next state value?

```
action, log_prob = actor.forward(current_state)
current_value = critic.forward(current_state)
action = action.detach().cpu().numpy()
new_state, reward, done, info = env.step(action)
reward = torch.from_numpy(np.array(reward)).type(torch.FloatTensor).to(device)
new_state = torch.from_numpy(new_state).type(torch.FloatTensor).to(device)
# should I put with torch.no_grad() at here?
with torch.no_grad()
next_value = self.critic.forward(new_state)
target_value = reward + gamma * next_value * (1 - int(done))
advantage = target_value - current_value
actor_loss = -1 * log_prob * advantage
critic_loss = advantage ** 2
loss = (actor_loss + critic_loss)
actor_optimizer.zero_grad()
critic_optimizer.zero_grad()
loss.backward()
actor_optimizer.step()
critic_optimizer.step()
```