Model Problems: Linear layers?

bruhnugget-nice · December 29, 2024, 7:54pm

Alright, so I have this hypothetical model that takes in 2 inputs called it and vim, which are models that both output a tensor from range [[0]] or [[1]]. When I try training this though, there is an error:

  warnings.warn(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-104-f92cb761d1cf> in <cell line: 1>()
----> 1 train_combined(done_pneumnist, train_pneumnist, loss_fn, optimizer_done)

12 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
    123 
    124     def forward(self, input: Tensor) -> Tensor:
--> 125         return F.linear(input, self.weight, self.bias)
    126 
    127     def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x64 and 2x224)

import torch.nn.functional as F


# Create the combination of ViMamba and the ViT
class CombinedViTM(nn.Module):
  def __init__(self, vit, vim):
    super(CombinedViTM, self).__init__()

    # Convolutional layers for efficient feature extraction.
    self.conv1 = nn.Conv2d(3, 16, 3, padding = 1)
    self.conv2 = nn.Conv2d(16, 16, 3, padding = 1)
    self.conv3 = nn.Conv2d(16, 32, 3, padding = 1)
    self.conv4 = nn.Conv2d(32, 3, 3, padding = 1)

    # Establish our ViMamba and ViT INSIDE the combined model
    self.vit=vit
    self.vim=vim

    self.hidden_layers=[]
    for i in range(16):
      self.hidden_layers.append(nn.Linear(2, 224))
      self.hidden_layers.append(nn.GELU())
      self.hidden_layers.append(nn.Linear(224, 64))
      self.hidden_layers.append(nn.GELU())
      self.hidden_layers.append(nn.Linear(64, 16))
      self.hidden_layers.append(nn.GELU())
      self.hidden_layers.append(nn.Linear(16, 4))
      self.hidden_layers.append(nn.GELU())
      self.hidden_layers.append(nn.Linear(4, 2))
      self.hidden_layers.append(nn.GELU())
    # Create our MLP Head
    self.head=nn.Sequential(
        *self.hidden_layers,
        nn.Linear(2, 224),
        nn.ReLU(),
        nn.Linear(224, 64),
        nn.ReLU(),
        nn.Linear(64, 16),
        nn.ReLU(),
        nn.Linear(16, 2),
        nn.ReLU(),
        nn.Linear(2,1),
        nn.Sigmoid(),
    )


  def forward(self, x):
    # Run the input through the convolutional layers.
    x=self.conv1(x)
    x=F.relu(x)
    x=self.conv2(x)
    x=F.relu(x)
    x=self.conv3(x)
    x=F.relu(x)
    x=self.conv4(x)
    x=F.relu(x)

    # Process the results from both the ViMamba and the ViT
    vit_result=self.vit(x)
    vim_result=self.vim(x)

    # Add both inputs (one may correct the other)
    new_result=torch.flatten(torch.cat((vit_result, vim_result), dim=0))

    # Process through the MLP head
    result=self.head(new_result)

    return result

What is the problem?

soulitzer · December 29, 2024, 11:45pm

The linear layer is expecting the embedding dim of the input to be 2, but the first part of the model computed a tensor with embedding dim of 64.
Without knowing any extra context on the specifics of the model, updating the nn.Linear(2, 224) to nn.Linear(64, 224) should fix this.

QLYYLQ · December 30, 2024, 5:30am

You can see the document: Linear — PyTorch 2.5 documentation, i guess you want to fuse the vit and vmamba’s results. The new result maybe have the shape: [Batch,2,H,W], you should transfom it into [batch,h,w,2] for self.head() before nn.Linear().

bruhnugget-nice · December 30, 2024, 3:22pm

This should fix the issue, I’ll try it out