Score of the TransformerEncoder layer?

gslaller · December 10, 2019, 1:15pm

Is there a way the get the score of the this Layer? Like the F.softmax((Q@K.t())/torch.sqrt(dim),-1) before it is multiplied with V.

vainaijr · December 10, 2019, 5:08pm

if we do,

import torch.nn as nn, torch
x = nn.TransformerEncoderLayer(10, 2)
y = nn.TransformerEncoder(x, 1)
src = torch.randn(1, 1, 10)
x.self_attn(src, src, src)

then we get,

(tensor([[[-0.1861,  0.1664,  0.0857, -0.2807, -0.2680, -0.1627,  0.0585,
            0.1379, -0.0257, -0.0476]]], grad_fn=<AddBackward0>),
 tensor([[[1.1111]]], grad_fn=<DivBackward0>))

the second output is average attention weights over heads