I want to make an attention matrix S
with different two input vectors (a
, b
) which have different length. I assume a
is context
and b
is query
in Q&A task.
In forward()
I was thinking about below code. Then, I have two questions.
- How to modify my code for more efficiency? (using
for
looks bad) - Only
W
is a trainable parameter. How to notify this thing to my model? I want to trainW
's parameters withloss.backward()
.
batch_size = 16
embd_dim = 10
a_len = 7
b_len = 4
a = torch.rand(batch_size, a_len, embd_dim).type(torch.DoubleTensor) # dummy input1
b = torch.rand(batch_size, b_len, embd_dim).type(torch.DoubleTensor) # dummy input2
# a_elmwise_mul_b: (N, a_len, b_len, embd_dim) dummy-code
a_elmwise_mul_b = torch.zeros(batch_size, a_len, b_len, embd_dim).type(torch.DoubleTensor)
S = torch.zeros(batch_size, a_len, b_len).type(torch.DoubleTensor)
W = torch.rand(3 * embd_dim).type(torch.DoubleTensor).view(1, -1) # must be trainable params
# I think there are better ways than below
for sample in range(batch_size):
for ai in range(a_len):
for bi in range(b_len):
a_elmwise_mul_b[sample, ai, bi] = torch.mul(a[sample, ai], b[sample, bi])
x = torch.cat((a[sample, ai], b[sample, bi], a_elmwise_mul_b[sample, ai, bi])) # (1, 3*embd_dim)
S[sample, ai, bi] = torch.mm(W, x.unsqueeze(1))[0][0]