Two small questions about :" with torch.no_grad():"

I got this warning:
volatile was removed and now has no effect. Usewith torch.no_grad():instead. inputs = Variable(inputs, volatile=True)
So I started to investigate :torch.no_grad(), and I have 2 small question about her:
1.In the validation file I had those lines:

inputs = Variable(inputs, volatile=True)
targets = Variable(targets, volatile=True)

Now with the new command it’s should be like this(???):

with torch.no_grad():
inputs = Variable(inputs,&#32) #Idont know why the indent is not showing :confused:
targets = Variable(targets,&#32)

let’s say I have a resnet and then a lstm, my resnet is alredy trained, and I want to train only the lstm, can I use this command? like this?
maybe other ways to freeze some part from training?

self.cnn = resnet.resnet50()
self.lstm = lstm()


with torch.no_grad():
x = self.cnn(x)


Thanks a lot!

1 Like


  1. You should actuall wrap the whole validation code (that requires no backward) within this torch.no_grad() block. Note that you can also use it as a decorator of your eval function:
def val(args):
    # You validation function
    return accuracy

1-bis In recent pytorch versions, Variable has been removed. You can remove all of them from your code. Tensors are the same as the old variables. They have a .requires_grad field to know if they require gradients and can take a requires_grad keyword argument on creation.

2 Yes this is exactly how you should do it:

with torch.no_grad():
    # No gradients in this block
    x = self.cnn(x)

# Gradients as usual outside of it
x = self.lstm(x)

Thanks a lot!
This forum is the best

And what if self.lstm is trained and want to train self.cnn?
I mean I am using in GAN setting when I want to fix the D and update G on the basis of the output of D.
So would it be valid to do the following?

y_pr = G(x)
with torch.no_grad():
--- o1 = D(y_pr)
loss = loss_fn(o1, y_target)


No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.

If you don’t want to update D, you can simply only pass G.parameters() to your optimizer :slight_smile:

1 Like

Thanks for your response. I am just a bit confused. So I have following code structure:

# D & G both are CNNs
# opt_D optimizer object for D.parameters() & opt_G is for G.parameters()

# To update D

# sample x,y,z from data set 
with torch.no_grad(): # 1) As do not want to update G 
     fake = G(x,y) 

d_out_fake = D(x, fake) # 2) not sure if it is possible to fake.detach()
d_out_real = D(x, z)
d_loss = loss_fn1(d_out_fake, d_out_real)

# To update G

# sample x,y,z from data set 

fake = G(x,y) 
# 3) As do not want to update D
with torch.no_grad(): 
    d_out_fake = D(x, fake) 
    d_out_real = D(x, z)

g_loss = loss_fn2(d_out_fake, d_out_real)

As you see there are separate optimizers for both of the models. Also, I read no_grad() don’t let gradients to be calculated. In my opinion I think I do not need to calculate grads on D while updating G but a bit puzzled currently. Would be great to have your insights.

As stated above: “No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.” You want the gradients of the input of D to be able to update G and thus you need gradients though D.

1 Like