Adding Prior Knowledge to Transformer

I’m interested in injecting a prior tensor (already computed) into a transformer. I think the best way to do it is to register a forward hook on the last encoder layer, detach the prior tensor, and fuse both tensors. Is this the correct way to do it?

Something like (consider both tensors of the same size):

def hook(model, input, output):
prior = prior.detach()
input = torch.add(input, prior, alpha=0.1)

train model regularly…