Two small questions about :" with torch.no_grad():"

barakb · October 19, 2018, 9:18am

I got this warning:
volatile was removed and now has no effect. Use with torch.no_grad(): instead. inputs = Variable(inputs, volatile=True)
So I started to investigate :torch.no_grad(), and I have 2 small question about her:
1.In the validation file I had those lines:

inputs = Variable(inputs, volatile=True)
targets = Variable(targets, volatile=True)

Now with the new command it’s should be like this(???):

with torch.no_grad():
inputs = Variable(inputs,&#32) #Idont know why the indent is not showing
targets = Variable(targets,&#32)

let’s say I have a resnet and then a lstm, my resnet is alredy trained, and I want to train only the lstm, can I use this command? like this?
maybe other ways to freeze some part from training?
Declaring

self.cnn = resnet.resnet50()
self.lstm = lstm()

Calling

with torch.no_grad():
x = self.cnn(x)

x=self.lstm(x)

Thanks a lot!

albanD · October 19, 2018, 9:56am

Hi,

You should actuall wrap the whole validation code (that requires no backward) within this torch.no_grad() block. Note that you can also use it as a decorator of your eval function:

@torch.no_grad()
def val(args):
    # You validation function
    return accuracy

1-bis In recent pytorch versions, Variable has been removed. You can remove all of them from your code. Tensors are the same as the old variables. They have a .requires_grad field to know if they require gradients and can take a requires_grad keyword argument on creation.

2 Yes this is exactly how you should do it:

with torch.no_grad():
    # No gradients in this block
    x = self.cnn(x)

# Gradients as usual outside of it
x = self.lstm(x)

barakb · October 19, 2018, 12:03pm

Thanks a lot!
This forum is the best

immuaz · March 1, 2019, 3:34am

And what if self.lstm is trained and want to train self.cnn?
I mean I am using in GAN setting when I want to fix the D and update G on the basis of the output of D.
So would it be valid to do the following?

y_pr = G(x)
with torch.no_grad():
--- o1 = D(y_pr)
loss = loss_fn(o1, y_target)

albanD · March 1, 2019, 10:54am

Hi,

No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.

If you don’t want to update D, you can simply only pass G.parameters() to your optimizer

immuaz · March 1, 2019, 1:42pm

Hey,
Thanks for your response. I am just a bit confused. So I have following code structure:

# D & G both are CNNs
# opt_D optimizer object for D.parameters() & opt_G is for G.parameters()

# To update D
D.train()
opt_D.zero_grad()

# sample x,y,z from data set 
with torch.no_grad(): # 1) As do not want to update G 
     fake = G(x,y) 

d_out_fake = D(x, fake) # 2) not sure if it is possible to fake.detach()
d_out_real = D(x, z)
d_loss = loss_fn1(d_out_fake, d_out_real)
d_loss.backward()
opt_D.step()


# To update G
G.train()
opt_G.zero_grad()

# sample x,y,z from data set 

fake = G(x,y) 
# 3) As do not want to update D
with torch.no_grad(): 
    d_out_fake = D(x, fake) 
    d_out_real = D(x, z)

g_loss = loss_fn2(d_out_fake, d_out_real)
g_loss.backward()
opt_G.step()

As you see there are separate optimizers for both of the models. Also, I read no_grad() don’t let gradients to be calculated. In my opinion I think I do not need to calculate grads on D while updating G but a bit puzzled currently. Would be great to have your insights.

albanD · March 1, 2019, 2:06pm

As stated above: “No because you actually want to compute some gradients in D. You want to compute the gradients wrt the input of D that you then use to compute the ones for G.” You want the gradients of the input of D to be able to update G and thus you need gradients though D.