I got this warning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. inputs = Variable(inputs, volatile=True)
So I started to investigate :torch.no_grad(), and I have 2 small question about her:
1.In the validation file I had those lines:
Now with the new command it’s should be like this(???):
with torch.no_grad():
inputs = Variable(inputs, ) #Idont know why the indent is not showing
targets = Variable(targets, )
let’s say I have a resnet and then a lstm, my resnet is alredy trained, and I want to train only the lstm, can I use this command? like this?
maybe other ways to freeze some part from training?
Declaring
You should actuall wrap the whole validation code (that requires no backward) within this torch.no_grad() block. Note that you can also use it as a decorator of your eval function:
@torch.no_grad()
def val(args):
# You validation function
return accuracy
1-bis In recent pytorch versions, Variable has been removed. You can remove all of them from your code. Tensors are the same as the old variables. They have a .requires_grad field to know if they require gradients and can take a requires_grad keyword argument on creation.
2 Yes this is exactly how you should do it:
with torch.no_grad():
# No gradients in this block
x = self.cnn(x)
# Gradients as usual outside of it
x = self.lstm(x)
And what if self.lstm is trained and want to train self.cnn?
I mean I am using in GAN setting when I want to fix the D and update G on the basis of the output of D.
So would it be valid to do the following?
y_pr = G(x)
with torch.no_grad(): --- o1 = D(y_pr)
loss = loss_fn(o1, y_target)
No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.
If you don’t want to update D, you can simply only pass G.parameters() to your optimizer
Hey,
Thanks for your response. I am just a bit confused. So I have following code structure:
# D & G both are CNNs
# opt_D optimizer object for D.parameters() & opt_G is for G.parameters()
# To update D
D.train()
opt_D.zero_grad()
# sample x,y,z from data set
with torch.no_grad(): # 1) As do not want to update G
fake = G(x,y)
d_out_fake = D(x, fake) # 2) not sure if it is possible to fake.detach()
d_out_real = D(x, z)
d_loss = loss_fn1(d_out_fake, d_out_real)
d_loss.backward()
opt_D.step()
# To update G
G.train()
opt_G.zero_grad()
# sample x,y,z from data set
fake = G(x,y)
# 3) As do not want to update D
with torch.no_grad():
d_out_fake = D(x, fake)
d_out_real = D(x, z)
g_loss = loss_fn2(d_out_fake, d_out_real)
g_loss.backward()
opt_G.step()
As you see there are separate optimizers for both of the models. Also, I read no_grad() don’t let gradients to be calculated. In my opinion I think I do not need to calculate grads on D while updating G but a bit puzzled currently. Would be great to have your insights.
As stated above: “No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.” You want the gradients of the input of D to be able to update G and thus you need gradients though D.