RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x13056 and 153600x2048)

ptrblck · September 27, 2022, 6:13pm

Your code generally works:

num_classes = 2
model = CNN()

x = torch.randn(2, 3, 192, 192)
out = model(x)
print(out.shape)
# torch.Size([2, 2])

and it seems the shape mismatch is raised in the loss calculation so check the shape of the model output as well as the target.

Sufiyan_Mukadam · September 27, 2022, 6:13pm

loss = loss_func(output, b_y)

srishti-git1110 · September 27, 2022, 6:15pm

As @ptrblck has mentioned, check the shapes using print statements.

And why did you index with [0] here:

Sufiyan_Mukadam · September 27, 2022, 6:17pm

The same training code works on 1X28X28 MNIST data. But when I am trying to run on 3X192X192 its showing me error

Sufiyan_Mukadam · September 27, 2022, 6:18pm

    Layer (type)               Output Shape         Param #

================================================================
Conv2d-1 [-1, 32, 192, 192] 896
BatchNorm2d-2 [-1, 32, 192, 192] 64
ReLU-3 [-1, 32, 192, 192] 0
MaxPool2d-4 [-1, 32, 96, 96] 0
Conv2d-5 [-1, 64, 96, 96] 18,496
BatchNorm2d-6 [-1, 64, 96, 96] 128
ReLU-7 [-1, 64, 96, 96] 0
MaxPool2d-8 [-1, 64, 48, 48] 0
Conv2d-9 [-1, 128, 48, 48] 73,856
BatchNorm2d-10 [-1, 128, 48, 48] 256
ReLU-11 [-1, 128, 48, 48] 0
MaxPool2d-12 [-1, 128, 24, 24] 0
Conv2d-13 [-1, 256, 24, 24] 295,168
BatchNorm2d-14 [-1, 256, 24, 24] 512
ReLU-15 [-1, 256, 24, 24] 0
MaxPool2d-16 [-1, 256, 12, 12] 0
Conv2d-17 [-1, 512, 12, 12] 1,180,160
BatchNorm2d-18 [-1, 512, 12, 12] 1,024
ReLU-19 [-1, 512, 12, 12] 0
MaxPool2d-20 [-1, 512, 6, 6] 0
Linear-21 [-1, 256] 4,718,848
ReLU-22 [-1, 256] 0
BatchNorm1d-23 [-1, 256] 512
Dropout-24 [-1, 256] 0
Linear-25 [-1, 2] 514

Total params: 6,290,434
Trainable params: 6,290,434
Non-trainable params: 0

Input size (MB): 0.42
Forward/backward pass size (MB): 56.68
Params size (MB): 24.00
Estimated Total Size (MB): 81.10

srishti-git1110 · September 27, 2022, 6:22pm

Replace this with the code below -

print(output.shape)
print(b_y.shape)

Sufiyan_Mukadam · September 27, 2022, 6:26pm

torch.Size([2])
torch.Size([128])

Sufiyan_Mukadam · September 27, 2022, 6:30pm

torch.Size([2])
torch.Size([128]).

Sufiyan_Mukadam · September 28, 2022, 8:00am

train_losses = []
val_losses = []
test_losses =[]
train_auc = []
val_auc = []
train_auc_epoch = []
val_auc_epoch = []
best_auc = 0.0
min_loss = np.Inf

since = time.time()

for e in range(num_epochs):

train_loss = 0.0
val_loss = 0.0

# Train the model
model.train()
for i, (images, labels) in enumerate(tqdm(train_dataloader, total=int(len(train_dataloader)))):
    images = images.to(device)
    labels = labels.to(device)
    images, labels = images.cuda(), labels.cuda()
    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Loss and accuracy
    train_loss += loss.item()
    y_actual = labels.data.cpu().numpy()
    y_pred = outputs[:,-1].detach().cpu().numpy()
    train_auc.append(roc_auc_score(y_actual, y_pred))


# Evaluate the model
model.eval()
for i, (images, labels) in enumerate(tqdm(val_dataloader, total=int(len(val_dataloader)))):
    images = images.to(device)
    labels = labels.to(device)
    
    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Loss and AUC
    val_loss += loss.item()
    y_actual = labels.data.cpu().numpy()
    y_pred = outputs[:,-1].detach().cpu().numpy()
    val_auc.append(roc_auc_score(y_actual, y_pred))
    



# Average losses and accuracies
train_loss = train_loss/len(train_dataloader)
val_loss = val_loss/len(val_dataloader)
train_losses.append(train_loss)
val_losses.append(val_loss)
training_auc = np.mean(train_auc)
validation_auc = np.mean(val_auc)
train_auc_epoch.append(training_auc)
val_auc_epoch.append(validation_auc)

# Updating best validation AUC
if best_auc < val_auc:
    best_acc = val_auc


# Saving best model
if min_loss >= val_loss:
    torch.save(model.state_dict(), 'best_model.pt')
    min_loss = val_loss

print('EPOCH {}/{}'.format(e+1, num_epochs))
print('-' * 10)
print("Train loss: {:.6f}, Train AUC: {:.4f}".format(train_loss, train_auc))
print("Validation loss: {:.6f}, Validation AUC: {:.4f}\n".format(val_loss, val_auc))

time_elapsed = time.time() - since
print(‘Training completed in {:.0f}m {:.0f}s’.format(time_elapsed // 60, time_elapsed % 60))
print(‘Best validation AUC: {:4f}’.format(best_auc))

100%
2048/2048 [19:19<00:00, 1.77it/s]
6%
16/256 [00:08<01:52, 2.13it/s]

ValueError Traceback (most recent call last)
in
52 y_actual = labels.data.cpu().numpy()
53 y_pred = outputs[:,-1].detach().cpu().numpy()
—> 54 val_auc.append(roc_auc_score(y_actual, y_pred))
55
56

2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py in _binary_roc_auc_score(y_true, y_score, sample_weight, max_fpr)
336 if len(np.unique(y_true)) != 2:
337 raise ValueError(
→ 338 "Only one class present in y_true. ROC AUC score "
339 “is not defined in that case.”
340 )

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

praveer_kumar · October 14, 2022, 12:58pm

Hi,
Can someone help me to get rid of this error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (96x4096 and 12288x200)

class Net(nn.Module):
    def __init__(self, ):
        super(Net, self).__init__()
        # two hidden later with 200 and 50 
        # input size is 64 * 64 
        self.fc1 = nn.Linear( 3 * IMAGE_SIZE * IMAGE_SIZE, 200)
        self.fc2 = nn.Linear(200, 50)
        self.fc3 = nn.Linear(50, 3) # output with NUM_CLASSES classes 
        
    def forward(self, x):
        # using relu activation function for each layer except output layer because for output we are using cross entropy loss
        print('x.shape', x.shape)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        # no activation is needed at the end
        return x

@ptrblck , can you pleae suggest ?

ptrblck · October 16, 2022, 7:21am

Based on your code it seems your input has a shape of [batch_size, 64*64=4096]. In this case the in_features of the first linear layer is expected to be set to 4096 too while it is set to a different value and raises the error.

GusdPaula · December 31, 2022, 5:16pm

Can anyone help me, please?

I’m having this issue with my chatbot:

You can see the code here:
https://github.com/GusdPaula/chatbot_pytorch

ptrblck · December 31, 2022, 8:48pm

The error points towards a shape mismatch in a matmul operation (so I guess it’s an nn.Linear layer). Make sure the feature dimension of the input to the linear layer matches the specified in_features. Currently your input tensor seems to have a shape pf [8, 8] while the linear layer expects 53 input features.

GusdPaula · January 2, 2023, 5:23pm

Thank you a lot. It worked!!

AlexeiD · January 14, 2023, 12:43pm

Hello, hope you are well!
Can someone help me to solve this error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x6 and 256x2)

I am trying to create an ensemble model classification for 2 classes - my models are vgg19, resnet50 and densenet161. My input image is resized to 224x224.

class EnsembleModel(nn.Module):   
    def __init__(self, modelA, modelB, modelC):
        super().__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelC = modelC
        self.classifier = nn.Linear(256, 2)
        
        
    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x3 = self.modelC(x)
        x = torch.cat((x1, x2, x3), dim=1)
        out = self.classifier(x)
        return out
    
ensemble_model = EnsembleModel(model_densenet161, model_resnet50, model_vgg19_bn)

for param in ensemble_model.parameters():
    param.requires_grad = False

for param in ensemble_model.classifier.parameters():
    param.requires_grad = True    

ensemble_model = ensemble_model.to(DEVICE)

ptrblck · January 14, 2023, 8:20pm

The input activation x to the self.classifier seems to contain 6 features based on the error message while 256 are expected.
You could change the in_features argument of self.classifier to 6 and it should work.
However, based on your description it seems you are trying to use the final logits of all models to create an ensemble so you might also want to consider using the penultimate feature output.

AlexeiD · January 15, 2023, 7:00am

Yes! Thank you! You are right, I ran the ensemble and worked, but it gives me way to low validation accuracy values (20-30% - while the single models are like 80-85%). Considering the penultimate feature output, do you mean something like this, changing the classifier?

self.classifier = nn.Sequential(
            nn.Linear(512, 6), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(6, 2), nn.LogSoftmax(dim=1))

Manu-Chauhan · January 15, 2023, 9:24am

Hi,

I am also getting same error… it works fine when batch_size=1 in train_loader but with batch of 10 I get this:

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, bias=False) 
        self.conv2 = nn.Conv2d(in_channels=10, out_channels=10, kernel_size=3, bias=False)
        self.fc1 = nn.Linear(in_features=10 * 24 * 24, out_features=10, bias=False)
        self.fc2 = nn.Linear(in_features=10, out_features=10, bias=False)
        self.out = nn.Linear(in_features=10, out_features=10, bias=False)
    
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = x.reshape(1,-1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.out(x)
        x = F.softmax(x, dim=1)
        return x

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x57600 and 5760x10)

Since I have set bias=False as well… so calculations here must be straight forward. Still I do not get as to why the line x = self.fc1(x) shows error.

Could you please explain the flow, what exactly am I missing??

train_set = torchvision.datasets.FashionMNIST(
    root="./data",
    train=True,
    download=True,
    transform=transforms.Compose([
        transforms.ToTensor()
    ])
)


train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=10,
    shuffle=True
)

batch = next(iter(train_loader))
images, labels = batch

preds = net(images)

Thanks

ptrblck · January 15, 2023, 10:27pm

Model ensembles often use the features of pretrained models instead of the final classifier output, so yes you could check the classifier and drop the final output layer(s).

@Manu-Chauhan
This line of code is wrong:

x = x.reshape(1,-1)

as it hard-codes the batch size to 1.
Use x = x.view(x.size(0), -1) and it should work.

Iqa · February 18, 2023, 7:02pm

I’m running DQN class for my classification model. Below is the snapshot of my layer. I got the RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x31 and 30x16) . Can anyone help with it?

‘’’
class DQN(nn.Module):

def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(30, 16)
    self.fc2 = nn.Linear(16, 18)
    self.fc3 = nn.Linear(18, 20)
    self.fc4 = nn.Linear(20, 24)
    self.fc5 = nn.Linear(24, 2)

def forward(self, x):
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = F.dropout(x, p=0.25)
    x = F.relu(self.fc3(x))
    x = F.relu(self.fc4(x))
    x = torch.sigmoid(self.fc5(x))
    return x

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x13056 and 153600x2048)

Total params: 6,290,434 Trainable params: 6,290,434 Non-trainable params: 0

100% 2048/2048 [19:19<00:00, 1.77it/s] 6% 16/256 [00:08<01:52, 2.13it/s]

Total params: 6,290,434
Trainable params: 6,290,434
Non-trainable params: 0

100%
2048/2048 [19:19<00:00, 1.77it/s]
6%
16/256 [00:08<01:52, 2.13it/s]