X_fooling.grad prints float32 values but when used shows NoneType

X_fooling = X.clone()
X_fooling = X_fooling.requires_grad_()

learning_rate = 1

l=model(X_fooling).squeeze()[target_y].sum()
l.backward()

g=X_fooling.grad
#print(g.dtype)

dX=learning_rate*X_fooling.grad/X_fooling.grad.norm()
X_fooling=X_fooling+dX

#unsupported operand type(s) for *: ‘int’ and ‘NoneType’

here X_fooling.grad is NoneType. Why?

How to solve this?

Hi,

I don’t think it is None here.
What is the full stack trace you see?
Also can you split all the computations in the line that fails to different lines so that the stack trace can point to the explicit op that failed.

It shows that NoneType can not have dtype, if you run the print line.

Ok, so it is None from the beginning, it does not actually tell you that it is float32?

The problem is X_fooling.grad if printed shows float32 values, but when used it shows as Nonetype and can not do anything with it. Not even .dtype. Shows X_fooling.grad is nonetype and can not do .dtype or any operations with it.

import torch
from torch.autograd import Variable
import torchvision.models

model=torchvision.models.squeezenet1_1(pretrained=True)

x=torch.randn(1,3,100,100)
target_y=6

x_fooling=x.clone()
x_fooling=x_fooling.requires_grad_()

while(torch.argmax(model(x_fooling).squeeze())!=target_y):
    
    l=model(x_fooling)[0][target_y]
    l.backward()

    dx=1*x_fooling.grad/x_fooling.grad.norm()
    
    x_fooling=x_fooling+dx

How x_fooling.grad is NoneType here? It is not even leaf node, why?

Can you replace the loop by the following and check

while True:
    x_fooling=x_fooling.requires_grad_()
    l=model(x_fooling)[0][target_y]
    l.backward()
    dx=1*x_fooling.grad/x_fooling.grad.norm()
    with torch.no_grad():
       x_fooling=x_fooling+dx 
       pred = torch.argmax(model(x_fooling).detach().squeeze())
       if pred == target_y: break

Hi,

At the first iteration, it works fine.
The problem is that at the end of it, you do x_fooling=x_fooling+dx which replaces you x_fooling leaf by a Tensor that is not a leaf anymore (it is the result of a differentiable op).

And so as suggested above, you need to make sure that this update is done in a non-differentiable way to make sure x_fooling remains a leaf. You can do so with

with torch.no_grad():
    x_fooling = x_fooling + dx

It works. But where was the problem in my code. Can you please explain a little. I think it is because of detach() . Another thing is it does not work if the prediction check is done as part of while condition. Why is that? Thank you.

@albanD has explained it well.

import torch
from torch.autograd import Variable
import torchvision.models

model=torchvision.models.squeezenet1_1(pretrained=True)

x=torch.randn(1,3,100,100)
target_y=6

x_fooling=x
x_fooling=x_fooling.requires_grad_()

while(True):
    
    with torch.no_grad():
        if (torch.argmax(model(x_fooling).squeeze())==target_y):
            break
    
    l=model(x_fooling)[0][target_y]
    l.backward()
    print(i)
    dx=1*x_fooling.grad/x_fooling.grad.norm()
    
    with torch.no_grad():
        x_fooling=x_fooling+dx
    

This does not work either, what is the issue, here? torch.no_grad is used and the leaf of X_fooling should have remained the same.

This works if

X_fooling=X_fooling.requires_grad_()

is added before
While (True)

Why is that?

This is because X_fooling needs to require gradient for you to be able to compute gradients for it.

So, the graph is recreated every iteration and if we want to accumulate all the gradients over some number of iterations, we should use another variable to do so. Thank you.