I’m interested in injecting a prior tensor (already computed) into a transformer. I think the best way to do it is to register a forward hook on the last encoder layer, detach the prior tensor, and fuse both tensors. Is this the correct way to do it?
def hook(model, input, output):
prior = prior.detach()
input = torch.add(input, prior, alpha=0.1)
train model regularly…