RuntimeError: mat1 and mat2 shapes cannot be multiplied (16880x128 and 1x256), unable to fix this for an attention based layer

I have an Attention-based feature fusion layer for two input feature embeddings. The model is defined here:

class AFF(nn.Module):
def init(self, in_features, out_features):
super(AFF, self).init()
self.in_features = in_features
self.out_features = out_features

# Attention mechanism
self.attention = nn.Linear(in_features * 2, 1)
self.attention.weight = nn.Parameter(torch.Tensor(self.in_features * 2, 1))
nn.init.xavier_uniform_(self.attention.weight) 
self.softmax = nn.Softmax(dim=1)
    
# Fusion layer
self.fc = nn.Linear(in_features * 2, out_features)
self.relu = nn.ReLU()

def forward(self, x1, x2):
print(“Shape of x1:”, x1.shape)
print(“Shape of x2:”, x2.shape)

if x1.size(0) != x2.size(0):
  if x1.size(0) < x2.size(0):
    padding = torch.nn.functional.pad(x1, (0,0,0, x2.size(0) - x1.size(0)), mode='constant', value=0)
    x_concat = torch.cat((padding, x2), dim=1)
  else:
    padding = torch.nn.functional.pad(x2, (0,0,0, x1.size(0) - x2.size(0)), mode='constant', value=0)
    x_concat = torch.cat((x1, padding), dim=1)
else:
  x_concat = torch.cat((x1, x2), dim=1)

print("Shape of x_concat:", x_concat.shape)

# Calculate attention weights
attn_weights = self.softmax(self.attention(x_concat))

# Apply attention to feature embeddings
x1_weighted = torch.mul(x1, attn_weights[:, :1])
x2_weighted = torch.mul(x2, attn_weights[:, 1:])

# Concatenate weighted feature embeddings
x_weighted_concat = torch.cat((x1_weighted, x2_weighted), dim=1)

# Fusion layer
fused_features = self.fc(x_weighted_concat)
fused_features = self.relu(fused_features)

return fused_features

How can I fix this error when I run it in my training loop?
The shapes of x1, x2 and x_concat are:
Shape of x1: torch.Size([16880, 64])
Shape of x2: torch.Size([2956, 64])
Shape of x_concat: torch.Size([16880, 128])

Your custom weight initialization is failing since the weight matrix inside the nn.Linear module has the shape [out_features, in_features].
Use:

self.attention.weight = nn.Parameter(torch.Tensor(1, self.in_features * 2))
nn.init.xavier_uniform_(self.attention.weight) 

and this issue should be gone.
Afterwards you will see a new failure:

x2_weighted = torch.mul(x2, attn_weights[:, 1:])

RuntimeError: The size of tensor a (64) must match the size of tensor b (0) at non-singleton dimension 1

since you are again indexing the weight matrix in a wrong way.
Fixing the indexing dimensions via:

        # Apply attention to feature embeddings
        x1_weighted = torch.mul(x1, attn_weights[:1, :])
        x2_weighted = torch.mul(x2, attn_weights[1:, :])

won’t resolve this error since the shape will still not match:

RuntimeError: The size of tensor a (2956) must match the size of tensor b (16879) at non-singleton dimension 0

so you would need to check how exactly these weights should be multiplied with x2 since they are not compatible.

The changed weight initialization worked, leading to a successful matrix multiplication. When later multiplying the weights with the inputs, I padded the smaller matrix to make it so that the size matches the weight matrix.