The shape of tensor k
is (seq_len, batch_size, embed_dim)
.
In activation.py
:
bias_k = Parameter(torch.empty((1, 1, embed_dim)))
In functional.py
:
# add bias along batch dimension (currently second)
k = torch.cat([k, bias_k.repeat(1, bsz, 1)])
However, the cat
operation concat these two tensors in seq_len
dimension.
Is this annotation # add bias along batch dimension (currently second)
wrong?
I think it should be add bias along sequence length dimension