I am trying to calculate the TCAV vectors for my model, for which I need to do the following thing:

def compute_tcav();

losses = [

(ActivationMaximization(model.layers[layer_idx], filter_indices), -1)

]

opt = Optimizer(input_tensor, losses, wrt_tensor=wrt_tensor, norm_grads=False)

grads = opt.minimize(seed_input=seed_input, max_iter=1, grad_modifier=grad_modifier, verbose=False)[1]

return utils.normalize(grads)[0]

I am using the amu package to compute activation maximization (GitHub - Nguyen-Hoa/Activation-Maximization: Python implementation of activation maximization with PyTorch.)

In this code, I need to get the gradient of the activation array for a particular layer wrt the target using the code below:

input.retain_grad() # non-leaf tensor

# network.zero_grad()

```
# Propogate image through network,
# then access activation of target layer
network(input)
layer_out = layer_activation[layer_name]
# compute gradients w.r.t. target unit,
# then access the gradient of input (image) w.r.t. target unit (neuron)
layer_out[0][unit].backward(retain_graph=True)
img_grad = input.grad
```

However, when I get my layer_out, it has a size of: torch.Size([1, 2, 97, 97, 97]) and therefore I cannot compute the layer_out[0][unit].backward(retain_graph=True).

How can I solve this?

`tensor.backward()`

will populate the gradient with a scalar `1`

value, if `tensor`

is also a scalar tensor.

If `tensor`

contains more than a single element, you would either need to pass the gradient explicitly to `backward`

(e.g. via `tensor.backward(gradient=torch.ones_like(tensor))`

) or you could reduce the tensor first e.g. via `tensor.mean().backward()`

.

Okay! That makes sense! I have now changed the code to the following:

```
tcav = {}
for ind, (img, label) in enumerate(loader):
img.requires_grad=True
img = img.to(device, dtype=torch.float)
output = model(img)
layer_activation = activation[layer_names[0]].cpu()
#layer_activation.retain_grad()
loss = torch.mean(layer_activation)
loss.backward(retain_graph=True)
grads = torch.nn.functional.normalize(img.grad)
tcav[ind] = {}
tcav[ind][layer_names[0]] = grads
```

This, I believe, still follows the activation maximisation code (GitHub - Nguyen-Hoa/Activation-Maximization: Python implementation of activation maximization with PyTorch.

However, when I run this, img.grad is None. I understand that it’s not a leaf tensor, but I’m not quite sure how to then compute the image gradient.

The `to()`

operation is differentiable and thus creates a non-leaf tensor:

```
img = torch.randn(1, 1)
print(img.is_leaf)
# True
img.requires_grad=True
print(img.is_leaf)
# True
img = img.to('cuda', dtype=torch.float)
print(img.is_leaf)
# False
print(img.grad_fn)
# <ToCopyBackward0 object at 0x7f6d48d5d1c0>
```

Move the tensor to the `device`

and `dtype`

first before setting the tensor’s `.requires_grad`

attribute to `True`

:

```
img = torch.randn(1, 1)
img = img.to('cuda', dtype=torch.float)
img.requires_grad_(True)
print(img.is_leaf)
# True
```