Hi, I’m trying to see if I can finetune a pretrained vision model for a downstream task using intermediate activations as features. Since it’s cumbersome to manually save activations by editing model definitions to output multiple tensors, I was trying to do this by just adding hooks to save activations from relevant layers.

As a litmus test, I wanted to see if I could train a model by grabbing the last layer’s output from a hook rather than directly from the model’s output:

```
def save_act(name, mod, inp, out):
activations[name].append(out)
```

and using the saved output from the linear layer to try and compute the gradients:

```
for name, m in model.named_modules():
if type(m) == nn.Linear:
m.register_forward_hook(partial(save_act, name))
...
activations[name].append(out)
...
model(images)
output = activations['module.fc'][0]
loss = criterion(output, target)
optimizer.zero_grad()
loss.backward()
```

At first, after trying this, I get:

```
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
```

But after adding `retain_graph=True`

to `loss.backward()`

, I get

```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 196]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```

which makes it seem like the Tensor that was saved is somehow stale or different from the one I should be using. Is there another way/fix for computing gradients on tensors obtained in a similar way?

Thanks!