Pytorch: torch.autograd.grad returns NoneType

Here is my code:

import torch
#Captum Attribution
from captum.attr import Saliency

model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)
model.eval()

sal = Saliency(model)

#X, y is an image and label
original_label = y
test_image = X.reshape([1,3,227,227]).float()

#I need gradient w.r.t. this test_image
test_image.requires_grad = True
test_image.retain_grad()

#Calculate saliency
attribution = sal.attribute(test_image, target=original_label)
attribution = torch.sum(torch.abs(attribution[0]), dim=0)
attribution = 227 * 227 * attribution / torch.sum(attribution)
attribution = attribution.view(-1)
elem1 = torch.argsort(attribution)[-1000:]
elements1 = torch.zeros(227 * 227)
elements1[elem1] = 1

#I need gradient of topK_loss w.r.t. test_image
topK_loss = torch.sum(attribution * elements1)
topK_loss.requires_grad = True
topK_loss.retain_grad()
gradients = -torch.autograd.grad(outputs=topK_loss, inputs=test_image, allow_unused=True)[0]

I get this error: bad operand type for unary -: 'NoneType'

I was told and searched that it means that the code is not able to find the path to the gradient.

Can anyone please help me resolve this issue and guide me to it?

Thanks in advance!

What kind of error so you see if allow_unused is set the the default False value? If an error is raised it would mean topK_loss was not created by test_image in a differentiable way and you would need to check if e.g. you have accidentally detached variables from the computation graph.

Hello @ptrblck ,
Thanks for replying.

What kind of error so you see if allow_unused is set the the default False value?

This gives,
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

If an error is raised it would mean topK_loss was not created by test_image in a differentiable way and you would need to check if e.g. you have accidentally detached variables from the computation graph.

I am also using an external library called captum that takes into input test_image and calculates attribution which is then used by topK_loss. Could that be an issue?

Otherwise the code is as is. I don’t see where I’m detaching a tensor from graph. Thanks.

I’m not familiar enough with Captum, but the original error explains why the gradient is None now.
You could check if the code works fine without Captum (if that’s possible) or you could try to isolate the line of code detaching the tensor from the computation graph by printing the .grad_fn of intermediate tensors making sure they are set to a proper backward function.

Thank you for your reply.

You could check if the code works fine without Captum (if that’s possible)

I just checked by commenting the captum’s use and gradients are getting calculated. It is not returning None anymore. Further, topK_loss is no more a leaf node.


model.eval()

#Captum Attribution

from captum.attr import Saliency

sal = Saliency(model)

# Calculate saliency

original_label = y[5]

test_image = X[5].reshape([1,3,227,227]).float()

print(test_image.shape)

test_image.requires_grad = True

test_image.retain_grad()

# attribution = sal.attribute(test_image, target=original_label)

attribution = torch.sum(torch.abs(test_image[0]), dim=0)

attribution = 227 * 227 * attribution / torch.sum(attribution)

attribution = attribution.view(-1)

print(attribution.shape)

elem1 = torch.argsort(attribution)[-1000:]

elements1 = torch.zeros(227 * 227)

elements1[elem1] = 1

topK_loss = torch.sum(attribution * elements1)

# topK_loss.requires_grad = True

# topK_loss.retain_grad()

# Calculate gradients

gradients = -torch.autograd.grad(outputs=topK_loss, inputs=test_image, allow_unused=False)[0]

print(gradients)

or you could try to isolate the line of code detaching the tensor from the computation graph by printing the .grad_fn of intermediate tensors making sure they are set to a proper backward function.

As soon as I use captum, the topK_loss is getting detached and is being treated as a leaf node. Further, topK_loss.grad_fn yields None. Indeed upon including captum’s attribution = sal.attribute(test_image, target=original_label), every intermediate tensor is showing None for grad_fn.

Or, could the issue be not captum but actually loading the model instead of defining in the same file?

model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)
model.eval()

What could be a way to handle this? Thank you!

Loading the model from the hub should not make a difference unless the actual model implementation differs.
I’m not familiar enough with the Saliency class, but maybe the gradient calculation is explicitly disabled since the returned attributions object would already contain the interesting gradients?

Here is the link to the source code:

It seems like saliency returns gradients with respect to input anyway. The saliency is gradient of Target output (with a label) with respect to input. However, I want the gradient of this saliency output itself with respect to input image. That basically makes it a double derivative I guess?

Is there anyway to enable gradients in the code somehow?

Thanks.

Any help would be appreciated!

    def attribute(
        self,
        inputs: TensorOrTupleOfTensorsGeneric,
        baselines: Union[
            TensorOrTupleOfTensorsGeneric, Callable[..., TensorOrTupleOfTensorsGeneric]
        ],
        n_samples: int = 5,
        stdevs: Union[float, Tuple[float, ...]] = 0.0,
        target: TargetType = None,
        additional_forward_args: Any = None,
        return_convergence_delta: bool = False,
    ) -> Union[
        TensorOrTupleOfTensorsGeneric, Tuple[TensorOrTupleOfTensorsGeneric, Tensor]
    ]:
        # since `baselines` is a distribution, we can generate it using a function
        # rather than passing it as an input argument
        print("HEEEERE")
        baselines = _format_callable_baseline(baselines, inputs)
        assert isinstance(baselines[0], torch.Tensor), (
            "Baselines distribution has to be provided in a form "
            "of a torch.Tensor {}.".format(baselines[0])
        )

        input_min_baseline_x_grad = InputBaselineXGradient(
            self.forward_func, self.multiplies_by_inputs
        )
        input_min_baseline_x_grad.gradient_func = self.gradient_func

        nt = NoiseTunnel(input_min_baseline_x_grad)

        # NOTE: using attribute.__wrapped__ to not log
        inputs.requires_grad_ = True
        attributions = nt.attribute.__wrapped__(
            nt,  # self
            inputs,
            nt_type="smoothgrad",
            nt_samples=n_samples,
            stdevs=stdevs,
            draw_baseline_from_distrib=True,
            baselines=baselines,
            target=target,
            additional_forward_args=additional_forward_args,
            return_convergence_delta=return_convergence_delta,
        ).requires_grad_()
        loss = torch.sum(attributions).requires_grad_()
        print(loss.grad_fn)
        print(torch.autograd.grad(loss, inputs))

        return attributions

I want to return the gradient of loss with respect to inputs. Right now it throws following error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
If this is solved, then so will be my issue.

Thank you!