I am trying to calculate the TCAV vectors for my model, for which I need to do the following thing:
def compute_tcav();
losses = [
(ActivationMaximization(model.layers[layer_idx], filter_indices), -1)
]
opt = Optimizer(input_tensor, losses, wrt_tensor=wrt_tensor, norm_grads=False)
grads = opt.minimize(seed_input=seed_input, max_iter=1, grad_modifier=grad_modifier, verbose=False)[1]
return utils.normalize(grads)[0]
I am using the amu package to compute activation maximization (GitHub - Nguyen-Hoa/Activation-Maximization: Python implementation of activation maximization with PyTorch.)
In this code, I need to get the gradient of the activation array for a particular layer wrt the target using the code below:
input.retain_grad() # non-leaf tensor
# network.zero_grad()
# Propogate image through network,
# then access activation of target layer
network(input)
layer_out = layer_activation[layer_name]
# compute gradients w.r.t. target unit,
# then access the gradient of input (image) w.r.t. target unit (neuron)
layer_out[0][unit].backward(retain_graph=True)
img_grad = input.grad
However, when I get my layer_out, it has a size of: torch.Size([1, 2, 97, 97, 97]) and therefore I cannot compute the layer_out[0][unit].backward(retain_graph=True).
How can I solve this?
tensor.backward()
will populate the gradient with a scalar 1
value, if tensor
is also a scalar tensor.
If tensor
contains more than a single element, you would either need to pass the gradient explicitly to backward
(e.g. via tensor.backward(gradient=torch.ones_like(tensor))
) or you could reduce the tensor first e.g. via tensor.mean().backward()
.
Okay! That makes sense! I have now changed the code to the following:
tcav = {}
for ind, (img, label) in enumerate(loader):
img.requires_grad=True
img = img.to(device, dtype=torch.float)
output = model(img)
layer_activation = activation[layer_names[0]].cpu()
#layer_activation.retain_grad()
loss = torch.mean(layer_activation)
loss.backward(retain_graph=True)
grads = torch.nn.functional.normalize(img.grad)
tcav[ind] = {}
tcav[ind][layer_names[0]] = grads
This, I believe, still follows the activation maximisation code (GitHub - Nguyen-Hoa/Activation-Maximization: Python implementation of activation maximization with PyTorch.
However, when I run this, img.grad is None. I understand that it’s not a leaf tensor, but I’m not quite sure how to then compute the image gradient.
The to()
operation is differentiable and thus creates a non-leaf tensor:
img = torch.randn(1, 1)
print(img.is_leaf)
# True
img.requires_grad=True
print(img.is_leaf)
# True
img = img.to('cuda', dtype=torch.float)
print(img.is_leaf)
# False
print(img.grad_fn)
# <ToCopyBackward0 object at 0x7f6d48d5d1c0>
Move the tensor to the device
and dtype
first before setting the tensor’s .requires_grad
attribute to True
:
img = torch.randn(1, 1)
img = img.to('cuda', dtype=torch.float)
img.requires_grad_(True)
print(img.is_leaf)
# True