RuntimeError: mat1 dim 1 must match mat2 dim 0 with an AI for cat or dog

Richard_Nied · November 15, 2020, 3:18pm

Hello, I need help and have not found a solution

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchvision import transforms
from PIL import Image
from os import listdir
import random

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], 
    std=[0.229, 0.224, 0.225]
)
transform = transforms.Compose([transforms.Resize(256),
                                transforms.CenterCrop(256),
                                transforms.ToTensor(),
                                normalize]
)
train_data_list = []
target_list = []
train_data = []
files = listdir('catdog/train/')
for f in range(len(listdir('catdog/train/'))):
    f = random.choice(files)
    files.remove(f)
    img = Image.open('catdog/train/' + f)
    img_tensor = transform(img)
    train_data_list.append(img_tensor)
    isCat = 1 if 'cat' in f else 0
    isDog = 1 if 'dog' in f else 0
    target = [isCat, isDog]
    target_list.append(target)
    if len(train_data_list) >= 64:
        train_data.append((torch.stack(train_data_list), target_list))
        train_data_list = []
        print('Loaded batch ', len(train_data), 'of ', int(len(listdir('catdog/train/'))/64))
        print('Percentage Done: ', int((len(train_data)/(int(len(listdir('catdog/train/'))/64)))*100), '%')        

class Netz(nn.Module):
    def __init__(self):
        super(Netz, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5, 1)
        self.conv2 = nn.Conv2d(6, 12, 5, 1)
        self.conv3 = nn.Conv2d(12, 18, 5, 1)
        self.conv4 = nn.Conv2d(18, 24, 5, 1)
        self.fc1 = nn.Linear(3456, 1000)
        self.fc2 = nn.Linear(1000, 2)

    def forward(self, x):
        x = self.conv1(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.max_pool2d(x,2)        
        x = F.relu(x)
        x = self.conv3(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = self.conv4(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        output = F.log_softmax(x)
        return output

model = Netz()
model.cuda()

optimizer = optim.Adam(model.parameters(), lr=0.01)
def train():
    model.train()
    batch_idx = 0
    for data, target in train_data:
        data = data.cuda()
        target = torch.Tensor(target).cuda()
        data = Variable(data)
        target = Variable(target)
        optimizer.zero_grad()
        out = model(data)
        criterion = F.nll_loss
        loss = criterion(out, target)
        loss.backward()
        optimizer.step()
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
            epoch, batch_idx * len(data), len(train_data),
            100. * batch_idx / len(train_data), loss.data[0]))
        batch_idx += 1

for epoch in range(1,10):
    train()

The error is:

Traceback (most recent call last):
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 111, in <module>
    train()
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 85, in train
    out = model(data)
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 66, in forward
    x = F.relu(self.fc1(x))
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\modules\linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

ptrblck · November 16, 2020, 8:06am

You are not flattening the activation output of self.conv4 before feeding it to self.fc1, so add

x = F.relu(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))

to avoid this error.

Also, F.nll_loss expects log probabilities, so use F.log_softmax instead of F.softmax.

Richard_Nied · November 16, 2020, 2:50pm

Thanks for your answer but now there is a new error. I wrote the code above.

c:/Users/Richard/Desktop/python/AI/catordog.py:68: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  output = F.log_softmax(x)
Traceback (most recent call last):
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 110, in <module>
    train()
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 86, in train
    loss = criterion(out, target)
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\functional.py", line 2264, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward

ptrblck · November 17, 2020, 6:39am

F.nll_loss expects a target in the shape [batch_size] containing class indices in the range [0, nb_classes-1] for a multi-class classification use case.
It seems you are passing the target as a FloatTensor, so transform it via target = target.long() and make sure the shape and values are as described before.

Richard_Nied · November 17, 2020, 11:44am

Thanks for your answer but the error is not fixed.

c:/Users/Richard/Desktop/python/AI/catordog.py:68: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  output = F.log_softmax(x)
Traceback (most recent call last):
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 96, in <module>
    train()
  File "c:/Users/Richard/Desktop/python/AI/catordog.py", line 87, in train
    loss = criterion(out, target)
  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\functional.py", line 2264, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: multi-target not supported at C:/cb/pytorch_1000000000000/work/aten/src\THCUNN/generic/ClassNLLCriterion.cu:15

ptrblck · November 17, 2020, 11:46am

This error is raised, if the target has too many dimensions.
Could you check its shape via print(target.shape) and make sure it’s [batch_size] for a multi-class classification?

Richard_Nied · November 17, 2020, 11:48am

def train():
    model.train()
    batch_idx = 0
    for data, target in train_data:
        data = data.cuda()
        target = torch.Tensor(target).cuda()
        data = Variable(data)
        target = Variable(target)
        target = target.long()
        print(target.shape)
        optimizer.zero_grad()
        out = model(data)
        criterion = F.nll_loss
        loss = criterion(out, target)
        loss.backward()
        optimizer.step()

So right?
But there is the same error

ptrblck · November 17, 2020, 11:49am

What shape does the print statement return?

Richard_Nied · November 17, 2020, 11:49am

torch.Size([64, 2])

ptrblck · November 17, 2020, 11:51am

That’s wrong as explained before.
It should have the shape [batch_size] and contain values in the range [0, nb_classes-1].
If you are dealing with two classes and are using a one-hot encoded tensor, use target = torch.argmax(target, dim=1) before passing it to the loss function.

Richard_Nied · November 17, 2020, 11:55am

OK sorry now it works.
Thank you very much.

Richard_Nied · November 17, 2020, 12:14pm

I have one more question, if I load all the pictures the batch_size is too big. How do I reduce that
without using fewer pictures?

  File "C:\Users\Richard\AppData\Local\Programs\DeepLearningStudio\user_data\data\1\user_conda\lib\site-packages\torch\nn\functional.py", line 2262, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (64) to match target batch_size (25000).

Megh_Bhalerao · November 17, 2020, 12:29pm

Batch size is the number of datapoints you forward pass at once and compute the gradients for. I am not sure I understand your question. Could you rephrase it? According to the error, the number of images and target labels must be same for calculating the loss

Richard_Nied · November 17, 2020, 12:44pm

How can I fix the problem so that it works?
Is that better the question