Freezing parameters

I study reinforcement learning and I want to implement a simple actor-critic approach.
my question about freezing parameters:
I have a critic network:
critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_new = critic(Variable(torch.Tensor(observation).view(1,4)))
Then I compute critic loss:
critic_loss = ((reward+(gamma*critic_new)) - critic_old)**2
So, in this case, I don’t want to backpropagate through critic_new only through critic_old. so critic_new should be like a fixed scalar.
How can I do this?

1 Like

Should just be something like
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2

2 Likes

Thank you!
I Have one more question:
As I understand these two blocks of code are the same.

critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_new = critic(Variable(torch.Tensor(observation).view(1,4)))
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2

.

critic = Critic()
critic_old = critic(Variable(torch.Tensor(observation).view(1,4))) critic_new = 
critic(Variable(torch.Tensor(observation).view(1,4), volatile = True))
critic_new.volatile = False
critic_loss = ((reward+(gamma*critic_new)) - critic_old)**2

Am I correct?

1 Like