How do I implement Gaussian-weighted self-attention in PyTorch? I would like to follow the proposed attention mechanism in T-GSA.
How do I implement Gaussian-weighted self-attention in PyTorch? I would like to follow the proposed attention mechanism in T-GSA.