Hi,
I want to implement post-hoc auditing of LLMs (AutoModelForCasualLM) (specifically Pythia-1b), and I would need access to gradients at each step (actually gradient per sample, but other methods I found here causes problems, e.g. torch.all in the transformers library). However, while calculate a loss I got the following problem
AttributeError: ‘Parameter’ object has no attribute ‘_forward_counter’.
input_ids = sample['input_ids'].unsqueeze(0).to(self.device)
attention_mask = sample['attention_mask'].unsqueeze(0).to(self.device)
loss = model(input_ids=input_ids, attention_mask=attention_mask, labels=input_ids).loss
And the last line cause an issue: AttributeError: ‘Parameter’ object has no attribute ‘_forward_counter’.
The model is GradSampleModule(gpt_neox)
It goes through when I use:
model.disable_hooks(), but I think it is not correct workaround.