Torch.no_grad do not work!

Run the following code in pytorch 0.4.0 version, you will see that the requires_grad property of tensor z is True, not False. It seems that torch.no_grad do not work. By the way, change the z=x in to z = x+0 or other functions of x will lead the requires_grad property of tensor z into false.

x = torch.zeros(1, requires_grad=True)
with torch.no_grad():
    z = x

Interesting! I also tried that in version 1.0.0 and the same behaviour happens:

>>> with torch.no_grad():
...     z = x

>>> z.requires_grad
True
>>> with torch.no_grad():
...     z = x + 0.0

>>> z.requires_grad
False

I wonder if this is a bug that needs to be fixed!

I don’t think this is a bug, because in

>>> with torch.no_grad(): 
...     z = x

You are not doing any computation, just referencing an existing object. As soon as you actually use that object, it will correctly set the “requires_grad” attribute to False, so there will never be an issue in practice of falsely computing a gradient in the “with” context.

If your goal is just to get a view of the tensor without gradient, use z = x.detach(). Or if you want to make a copy of it as well, use z = x.clone(). In both cases, .requires_grad will then be False. The with torch.no_grad() is more meant for handling actual computations.

3 Likes

Thank you for you replay.

If I use .view function to change the shape of x, the requires_grad of new variable will also be true rather than false.

 x = torch.FloatTensor([1,2,3,4])
 x.requires_grad_()
with torch.no_grad():
    y = x.view(2,2)

Another situation, if the x is a nn.Module object(for example a nerual network) , the same behavior happens.

What is your suggestion when someone try to get another nn.Module object that share the same parameter value but do not need gradient?

What is your suggestion when someone try to get another nn.Module object that share the same parameter value but do not need gradient?

Use

y = x.detach().view(2,2)

or

y = x.view(2,2).detach()

Hi

Just to give a small precision of why this happens.
When you do:

with torch.no_grad():
    y = x.view(2, 2)

What the no_grad block does is to not consider the operations inside as autograd ops. If you do an out of place operation in it, the result won’t require gradients because the operation that created it is not considered. Now if you return a view to an existing Tensor that itself requires gradient. Even though you ignore ops inside the block, the original Tensor requires gradients and since y here is just a view into it, it also requires gradients.

If you want to detach a Tensor from the computational graph, you can use as suggested above .detach(). I would advise to do the view first btw: y = x.view(2, 2).detach().

1 Like