Masking\Setting Attentions in Vision Transformers


I’m looking for a generic way to set all my attention maps to some constant matrix A (instead of calculating the inner products) throughout an entire model, any suggestions?
One of the main issues is that I’m attempting to do this over many different pretrained models from different libraries - timm \ huggingface \ torchvision.