Hi, I’m trying to see if I can finetune a pretrained vision model for a downstream task using intermediate activations as features. Since it’s cumbersome to manually save activations by editing model definitions to output multiple tensors, I was trying to do this by just adding hooks to save activations from relevant layers.
As a litmus test, I wanted to see if I could train a model by grabbing the last layer’s output from a hook rather than directly from the model’s output:
def save_act(name, mod, inp, out): activations[name].append(out)
and using the saved output from the linear layer to try and compute the gradients:
for name, m in model.named_modules(): if type(m) == nn.Linear: m.register_forward_hook(partial(save_act, name)) ... activations[name].append(out) ... model(images) output = activations['module.fc'] loss = criterion(output, target) optimizer.zero_grad() loss.backward()
At first, after trying this, I get:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
But after adding
loss.backward(), I get
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 196]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
which makes it seem like the Tensor that was saved is somehow stale or different from the one I should be using. Is there another way/fix for computing gradients on tensors obtained in a similar way?