Implementing stand alone self attention

vainaijr · January 23, 2020, 9:33am

can somebody tell me how do I implement stand alone self attention in PyTorch.
Here is my attempt, I found one implementation on github, but I think it is wrong.

q = torch.randn(1, 1, 3, 3) # for simplicity assume one image, one channel, HXW = 3x3
k = torch.randn(1, 1, 3, 3)
v = torch.randn(1, 1, 3, 3)

l_q = nn.Linear(1, 20)
l_k = nn.Linear(1, 20)
l_v = l_k

transformed_q = l_q(q.view(1, 1, 3, 3, 1))
transformed_k = l_k(k.view(1, 1, 3, 3, 1))
transformed_v = l_v(v.view(1, 1, 3, 3, 1))

transformed_k.shape # each pixel is 20 size embedding

torch.Size([1, 1, 3, 3, 20])

t_q = transformed_q.reshape(1, 1, 9, 20)

t_k = transformed_k.reshape(1, 1, 9, 20)

t_v = transformed_v.reshape(1, 1, 9, 20)

emb = nn.Embedding(9, 20) # for each pixel we have 9 neighbouring pixels
softmax = nn.Softmax(dim=-1)
attn_output_weights = softmax(t_q@t_k.transpose(2, 3) + 
                              t_q@emb(torch.tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])).transpose(0, 1))
attn = attn_output_weights@t_v

how do I use relative position embedding, that is row, column offset, I think what I am doing is wrong.