I don’t think it is None here.
What is the full stack trace you see?
Also can you split all the computations in the line that fails to different lines so that the stack trace can point to the explicit op that failed.
The problem is X_fooling.grad if printed shows float32 values, but when used it shows as Nonetype and can not do anything with it. Not even .dtype. Shows X_fooling.grad is nonetype and can not do .dtype or any operations with it.
Can you replace the loop by the following and check
while True:
x_fooling=x_fooling.requires_grad_()
l=model(x_fooling)[0][target_y]
l.backward()
dx=1*x_fooling.grad/x_fooling.grad.norm()
with torch.no_grad():
x_fooling=x_fooling+dx
pred = torch.argmax(model(x_fooling).detach().squeeze())
if pred == target_y: break
At the first iteration, it works fine.
The problem is that at the end of it, you do x_fooling=x_fooling+dx which replaces you x_fooling leaf by a Tensor that is not a leaf anymore (it is the result of a differentiable op).
And so as suggested above, you need to make sure that this update is done in a non-differentiable way to make sure x_fooling remains a leaf. You can do so with
It works. But where was the problem in my code. Can you please explain a little. I think it is because of detach() . Another thing is it does not work if the prediction check is done as part of while condition. Why is that? Thank you.
So, the graph is recreated every iteration and if we want to accumulate all the gradients over some number of iterations, we should use another variable to do so. Thank you.