# Make a two dimensional attention matrix with two different vectors

I want to make an attention matrix `S` with different two input vectors (`a`, `b`) which have different length. I assume `a` is `context` and `b` is `query` in Q&A task.
In `forward()` I was thinking about below code. Then, I have two questions.

1. How to modify my code for more efficiency? (using `for` looks bad)
2. Only `W` is a trainable parameter. How to notify this thing to my model? I want to train `W`'s parameters with `loss.backward()`.
``````batch_size = 16
embd_dim = 10
a_len = 7
b_len = 4

a = torch.rand(batch_size, a_len, embd_dim).type(torch.DoubleTensor)  # dummy input1
b = torch.rand(batch_size, b_len, embd_dim).type(torch.DoubleTensor)  # dummy input2
# a_elmwise_mul_b: (N, a_len, b_len, embd_dim)   dummy-code
a_elmwise_mul_b = torch.zeros(batch_size, a_len, b_len, embd_dim).type(torch.DoubleTensor)
S = torch.zeros(batch_size, a_len, b_len).type(torch.DoubleTensor)
W = torch.rand(3 * embd_dim).type(torch.DoubleTensor).view(1, -1) # must be trainable params
# I think there are better ways than below
for sample in range(batch_size):
for ai in range(a_len):
for bi in range(b_len):
a_elmwise_mul_b[sample, ai, bi] = torch.mul(a[sample, ai], b[sample, bi])
x = torch.cat((a[sample, ai], b[sample, bi], a_elmwise_mul_b[sample, ai, bi])) # (1, 3*embd_dim)
S[sample, ai, bi] = torch.mm(W, x.unsqueeze(1))
``````

For training, just using `nn.Parameter()` is solution for trainable params like this?

``````W = nn.Parameter(torch.rand(3 * embd_dim).type(torch.DoubleTensor).view(1, -1) # must be trainable params)
``````