Crossattention problem

Hello everyone:
sorry to disturb us. When I used crossattention, I found all of scores are 1. To be honest, I doubt the crossattention layer I constructed was wrong. Here are coding and scores and models. Could you please give me some advice to address these problem?
best wishes

class DotProductAttention(nn.Module):
def init(self, key_size, num_hiddens, dropout, **kwargs):
super(DotProductAttention, self).init(**kwargs)

    self.key_size = key_size
    self.num_hiddens = num_hiddens
    self.dropout = dropout
    self.dropout = nn.Dropout(dropout)
    self.W_k = nn.Linear(key_size, num_hiddens, bias=False)

def forward(self, queries, keys, values):
    d = queries.shape[-1]
    queries = self.W_k(queries)

    # Set `transpose_b=True` to swap the last two dimensions of `keys`
    scores = torch.bmm(queries, keys.transpose(1, 2)) / math.sqrt(d)
    scores = nn.functional.softmax(scores, dim=-1)
    print('scores', scores.shape, scores)

    return torch.bmm(self.dropout(scores), values)

if name == ‘main’:

X = torch.normal(0, 1, (14, 1, 384))
Y = torch.normal(0, 1, (14, 1, 1024))
Y = torch.normal(0, 1, (14, 1, 1024))

#hyparameters #t_a_a
key_size = 384
num_hiddens = 1024
dropout = 0.5
model = DotProductAttention(key_size, num_hiddens, dropout)
output = model(X,Y,Y)
print('output shape: ', output.shape)

models summary and scores

(dropout): Dropout(p=0.5, inplace=False)
(W_k): Linear(in_features=384, out_features=1024, bias=False)
scores torch.Size([14, 1, 1]) tensor([[[1.]],
[[1.]]], grad_fn=)
output shape: torch.Size([14, 1, 1024])

By the way: X = torch.normal(0, 1, (14, 1, 384)), (14: seq_length, 1: batchsize, 384: feature dimension). how to fix it? Thanks, best wishes

The usual order is Batch x Sequence Length x FeatureDimension.

The way you are defining X and Y, it looks like you have 14 batches, each with only one element.

So the attention of one element with respect with only the same element will return 1,since there are no other elements to compare to (due to normalization and soft max, or else you would get the values squared, but always the same)

I have not tried it but I think this should work:

#                       B, Seq, FeatDim
X = torch.normal(0, 1, (1,  14,      384))
Y = torch.normal(0, 1, (1,  14,    1024))

Hope this helps :smile:

Thanks. To be honest, My batchsize is 1, sequcelength is 14, feature dimension is 384.
However, I try your way to exchange dimension. Miraculously it does print out different scores. So the fundamental question is that?
if batchsize is 1, sequence length is 14, feature dimension is 384. How to fix it? Thanks, I am grateful for your help. Thanks, best wishes

B, Seq, FeatDim

X = torch.normal(0, 1, (1, 14, 384))
Y = torch.normal(0, 1, (1, 14, 1024))

Sorry I do not understand the question.

Batch = 1
Sequence Length = 14
Feature Dimension = 384

Then the correct order is like the one you posted here.

What do you need to fix?

sorry. I am grateful for your detailed guidance for newborn children.

You said the input size has 14 batches. Thus, I guess you have missed something. In fact, the input size (14,1, 384) is (seq_length, batchsize, feature_dimension).
I just put one sample to prove the cross attention works?

By the way, the standard definition of cross attention needed to add 3 different weight layers (query, key, value) . However, in my model I just define query weight, miss key value weight layer. Is it ture?

best wishes

What I mean with this

Is that when you do this like in your original post



You do NOT have 1 batch, you are saying 14 batches.

The first number is the batches, so this is wrong.

If you change to this



Then you have 1 batch. This is correct. 1 batch. 14 seq. 384 feature dimension.

This is not correct, you cannot put the batch size in the middle.

It should be

  1. Batch
  2. Sequence length
  3. Feature dimension

I am really grateful for your detailed guidance. Thanks, best wishes.

Thank you for the pytorch forum. It is such a nice platform.
best wishes

1 Like