I study reinforcement learning and I want to implement a simple actor-critic approach.
my question about freezing parameters:
I have a critic network:
critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_new = critic(Variable(torch.Tensor(observation).view(1,4)))
Then I compute critic loss:
critic_loss = ((reward+(gamma*critic_new)) - critic_old)**2
So, in this case, I don’t want to backpropagate through critic_new
only through critic_old
. so critic_new
should be like a fixed scalar.
How can I do this?
1 Like
Should just be something like
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2
2 Likes
Thank you!
I Have one more question:
As I understand these two blocks of code are the same.
critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_new = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2
.
critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4))) critic_new =
critic(Variable(torch.Tensor(observation).view(1,4), volatile = True))
critic_new.volatile = False
critic_loss = ((reward+(gamma*critic_new)) - critic_old)**2
Am I correct?
1 Like