Gaussian-weighted self-attention implementation

How do I implement Gaussian-weighted self-attention in PyTorch? I would like to follow the proposed attention mechanism in T-GSA.