Make a two dimensional attention matrix with two different vectors

I want to make an attention matrix S with different two input vectors (a, b) which have different length. I assume a is context and b is query in Q&A task.
In forward() I was thinking about below code. Then, I have two questions.

  1. How to modify my code for more efficiency? (using for looks bad)
  2. Only W is a trainable parameter. How to notify this thing to my model? I want to train W's parameters with loss.backward().
batch_size = 16
embd_dim = 10
a_len = 7
b_len = 4

a = torch.rand(batch_size, a_len, embd_dim).type(torch.DoubleTensor)  # dummy input1
b = torch.rand(batch_size, b_len, embd_dim).type(torch.DoubleTensor)  # dummy input2
# a_elmwise_mul_b: (N, a_len, b_len, embd_dim)   dummy-code
a_elmwise_mul_b = torch.zeros(batch_size, a_len, b_len, embd_dim).type(torch.DoubleTensor)
S = torch.zeros(batch_size, a_len, b_len).type(torch.DoubleTensor)
W = torch.rand(3 * embd_dim).type(torch.DoubleTensor).view(1, -1) # must be trainable params
# I think there are better ways than below
for sample in range(batch_size):
    for ai in range(a_len):
        for bi in range(b_len):
            a_elmwise_mul_b[sample, ai, bi] = torch.mul(a[sample, ai], b[sample, bi])
            x =[sample, ai], b[sample, bi], a_elmwise_mul_b[sample, ai, bi])) # (1, 3*embd_dim)
            S[sample, ai, bi] =, x.unsqueeze(1))[0][0]

For training, just using nn.Parameter() is solution for trainable params like this?

W = nn.Parameter(torch.rand(3 * embd_dim).type(torch.DoubleTensor).view(1, -1) # must be trainable params)