Pytorch MultiModel

I’m trying to implement multimodel for an Image regression task using MONAI, a PyTorch based framework. Adapted from this approach
MONAI_multimodal/demo_monai_multimodal_crisismmd.ipynb at main · nvahmadi/MONAI_multimodal (

However, I’m getting this error.

RuntimeError: Function AddmmBackward returned an invalid gradient at index 1 - got [32, 272] but expected shape compatible with [32, 1040]

Here’s the approach I used.

class MultimodalNet(nn.Module):
    def __init__(self, n_classes, hidden_size_fc_image=256, hidden_size=256):
        # same vision model as above, again implicitly containing an ebmedding layer fc1
        self.cnn = TorchVisionFCModel("resnet50", num_classes=hidden_size_fc_image, use_conv=True, pretrained=True)
        self.cnn.features[0]=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

        for param in self.cnn.parameters():
            param.requires_grad = False 
        for param in self.cnn.fc.parameters():
            param.requires_grad = True
        nr_genders = 2
        # Fully Connected layer for gender
        self.gen_fc_1 = nn.Linear(1,16)
        # Feature Concatenation Layer
        self.fc1 = nn.Linear(16+hidden_size_fc_image,hidden_size)
        # Final Fully Connected Layer
        self.fc2 = nn.Linear(hidden_size,n_classes)
    def forward(self, images, gender):
        x = F.relu(self.cnn(images))
        x = x.view(x.size(0), -1)
        print(x.shape)   #torch.Size([32, 1024])
        # =============================================================================
        #       Gender Fully Connected Layer
        # =============================================================================
        y = F.relu(self.gen_fc_1(gender))
        y = y.view(y.size(0), -1)
        print(y.shape)   #torch.Size([32, 16])

        # =============================================================================
        #       Feature Concatenation Layer
        # =============================================================================
        z =,y),dim = 1)
        x = F.relu(self.fc1(z))
        x = self.fc2(x)
        return x

I’m using L1Loss(). Not sure where I’m doing wrong.

Could you post the missing definitions (e.g. TorchVisionFCModel) and make this code executable so that we could try to reproduce the issue, please?

TorchVisionFCModel is used to Customize the fully connected layer of (pretrained) TorchVision model or replace it by convolutional layer provided by MONAI. Here’s the documentation.

from monai.networks.nets import resnet50, TorchVisionFCModel

Thanks for providing the information.
I’m unable to reproduce the issue using:

model = MultimodalNet(10)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.L1Loss()

data = torch.randn(2, 1, 224, 224)
gender = torch.randn(2, 1)
target = torch.randn(2, 10)

for _ in range(10):
    output = model(data, gender)
    loss = criterion(output, target)
    print("loss {:.3f}".format(loss.item()))

Simple mistake on your definition for fc1.
Try with this:

self.fc1 = nn.Linear(16+1024,hidden_size)

Alternatively(NOT both), you could add a maxpool2d layer here:

def forward(self, images, gender):
        x = F.relu(self.cnn(images))
        x = F.maxpool2d(x, 2, 2) # <---- new line
        x = x.view(x.size(0), -1)

That will reduce the dim=1 size of x by a factor of 4.

modifying definition of fc1 worked. Thank you!

1 Like