RuntimeError: size mismatch, m1: [8 x 11552], m2: [1800 x 128]

alber_mel · January 25, 2021, 10:41am

Hi Community,

I am having this error below:
ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [8 x 11552], m2: [1800 x 128]
I read about similar topics posted however It didn’t come out into a proper conclusion into resolving my error:

This is the input size input x forwarded into the model and processed:

I am using alextnet and the raw image size is 224*224
I flatten the input before forwarding using the below code
x = data.view(data.shape[0], -1)

Thank you in advance

alber_mel · January 25, 2021, 1:11pm

While I am digging into this error further, I found out that i am specifying in alexnext the following:
hiddens = [8,8]
flatten = 1800
while actually I am passing 224 * 224 = 11552 which is why it is throwing this error,
I resized the image to 32 * 32 and update the alextnet settings into:
hiddens = [32,32]
flatten = 1152
and solved this error
However this is affecting my alexnet network if I passed the input as 32 * 32 , and then I should be customising the network which I dont want to at this stage,
I am using this transformation and same image input 224.
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
same as this blog post AlexNet | PyTorch
anyway around this to pass the image size without resizing the image ?

ptrblck · January 26, 2021, 11:02am

I’m not sure which implementation you are using and where you are passing hiddens and flatten, as the torchvision AlexNet implementation doesn’t support these arguments, if I’m not mistaken, and expects the inputs to have the spatial shape 224x224 as seen here:

model = models.alexnet()
x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.shape)
> torch.Size([1, 1000])

alber_mel · January 26, 2021, 11:30am

You are right ptrblck, I used a simple implementation before and I was testing the same dataset in a notebook and I was able to train within alexnet without any faltten and hiddens layers and with just a minor changes, however in the current project I am working it looks like a modified version of alexnet which is more complex and include a shared and private classes to support continual multitask learning.

This is the code I am working with, I am using two classes shared and private which I was modifying the values to be able to process the forwarded inputs , I appreciate your time and assistance.

class Shared(torch.nn.Module):

def __init__(self,args):
    super(Shared, self).__init__()

    #added a new dataset
    if args.experiment == 'mydataset':
        hiddens = [64, 128, 256, 512, 512, 512]

    else:
        raise NotImplementedError

    self.conv1=torch.nn.Conv2d(self.ncha,hiddens[0],kernel_size=size//8)
    s=utils.compute_conv_output_size(size,size//8)
    s=s//2
    self.conv2=torch.nn.Conv2d(hiddens[0],hiddens[1],kernel_size=size//10)
    s=utils.compute_conv_output_size(s,size//10)
    s=s//2
    self.conv3=torch.nn.Conv2d(hiddens[1],hiddens[2],kernel_size=2)
    s=utils.compute_conv_output_size(s,2)
    s=s//2
    self.maxpool=torch.nn.MaxPool2d(2)
    self.relu=torch.nn.ReLU()

    self.drop1=torch.nn.Dropout(0.2)
    self.drop2=torch.nn.Dropout(0.5)
    self.fc1=torch.nn.Linear(hiddens[2]*s*s,hiddens[3])
    self.fc2=torch.nn.Linear(hiddens[3],hiddens[4])
    self.fc3=torch.nn.Linear(hiddens[4],hiddens[5])
    self.fc4=torch.nn.Linear(hiddens[5], self.latent_dim)


def forward(self, x_s):
    x_s = x_s.view_as(x_s)
    h = self.maxpool(self.drop1(self.relu(self.conv1(x_s))))
    h = self.maxpool(self.drop1(self.relu(self.conv2(h))))
    h = self.maxpool(self.drop2(self.relu(self.conv3(h))))
    h = h.view(x_s.size(0), -1)
    h = self.drop2(self.relu(self.fc1(h)))
    h = self.drop2(self.relu(self.fc2(h)))
    h = self.drop2(self.relu(self.fc3(h)))
    h = self.drop2(self.relu(self.fc4(h)))
    return h

class Private(torch.nn.Module):
def init(self, args):
super(Private, self).init()

    if args.experiment == 'mydataset':
        hiddens = [32,32]
        flatten = 1152

    else:
        raise NotImplementedError


    self.task_out = torch.nn.ModuleList()
    for _ in range(self.num_tasks):
        self.conv = torch.nn.Sequential()
        self.conv.add_module('conv1',torch.nn.Conv2d(self.ncha, hiddens[0], kernel_size=self.size // 8))
        self.conv.add_module('relu1', torch.nn.ReLU(inplace=True))
        self.conv.add_module('drop1', torch.nn.Dropout(0.2))
        self.conv.add_module('maxpool1', torch.nn.MaxPool2d(2))
        self.conv.add_module('conv2', torch.nn.Conv2d(hiddens[0], hiddens[1], kernel_size=self.size // 10))
        self.conv.add_module('relu2', torch.nn.ReLU(inplace=True))
        self.conv.add_module('dropout2', torch.nn.Dropout(0.5))
        self.conv.add_module('maxpool2', torch.nn.MaxPool2d(2))
        self.task_out.append(self.conv)
        self.linear = torch.nn.Sequential()

        self.linear.add_module('linear1', torch.nn.Linear(flatten,self.latent_dim))
        self.linear.add_module('relu3', torch.nn.ReLU(inplace=True))
        self.task_out.append(self.linear)

class Net(torch.nn.Module):

def __init__(self, args):
    super(Net, self).__init__()

    self.hidden1 = args.head_units
    self.hidden2 = args.head_units//2

    self.shared = Shared(args)
    self.private = Private(args)

    self.head = torch.nn.ModuleList()
    for i in range(self.num_tasks):
        self.head.append(
            torch.nn.Sequential(
                torch.nn.Linear(2*self.latent_dim, self.hidden1),
                torch.nn.ReLU(inplace=True),
                torch.nn.Dropout(),
                torch.nn.Linear(self.hidden1, self.hidden2),
                torch.nn.ReLU(inplace=True),
                torch.nn.Linear(self.hidden2, self.taskcla[i][1])
            ))

ptrblck · January 26, 2021, 11:44am

Based on the error message and that you were using flatten = 1800 before, I guess the shape mismatch is created in the self.linear module of the Private module.
Assuming the [8 x 11552] shape represents the activations using the 224x224 inputs, set flatten to 11552 and rerun your script.

alber_mel · January 26, 2021, 3:36pm

Hi ptrblck, well it is working thank you for your suggestions and reply,
like usual solving one issue and start working on another error
I am getting one target out of bounds after running the script,
line 2218, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 16 is out of bounds.
still I didnt debug it yet properly but I thought to update you that the size mismatch issue sorted out Thank you heaps.

ptrblck · January 26, 2021, 8:44pm

Good to hear you’ve solved the first issue!

For the second one: check the shape of the model output and make sure it contains logits for at least 17 classes, since the target tries to index it with a value of 16 ([0, 16] make 17 valid classes).

alber_mel · January 28, 2021, 12:53pm

Hi Ptrblck,

Sorry for the late reply, I was debugging the code and digging into the issue further. I tested couple of things, however it still not resolved.

1- I have the shape of the numberofclasses= 25

2- The Classes of the dataset are from 0 to 24 numeric and also strings are 25 the labels of the dataset
In the code this is how I am getting the value of y:

3- while debugging this is where it is throwing the error on task_loss:

y:

I can see the index 16 is already there

Thank you in advance for your time and guidance.

ptrblck · January 29, 2021, 12:46am

Since you are working with 25 different classes, your model output should also contain logits for these 25 classes and thus should have the shape [batch_size, 25]. Could you check the model output shape before you pass it to the criterion?

alber_mel · January 30, 2021, 5:35am

Hi Ptrblck,
Thank you for your reply
I updated the ntasks:1 #number of tasks
which gave me the below:
print(output.shape)

Now after I did that I can see that its start forwarding, also I was mistaken by the number of classes because after that it crashed on the 25 class and was complaining outside the boundary.

However once i updated the num_classes: 26 the output shape is (8, 26) and I was able to pass all the tensors and awaiting to display the epoch counting finger crossed
it looks promising, it took time for training however it started to work at least even not good accuracy etc …
output shape torch.Size([8, 26])
I still want to figure out why once I reduce the number of tasks it affected the shape of the output model?
Thank you ptrblck without your assistance I couldn’t get this far cheers.

alber_mel · September 5, 2021, 7:33am

Hi all,

I had the time to go ahead and continue my customisation for this project, once I set the task in the conf into ((ntasks: 1) the program completes successfully the training and validation. However, the learning rate is still very low as showing above,
If I increase the learning rate (ntasks: 5) I am getting the below error on training task1:

in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 17 is out of bounds.

Thank you all in advance.

ptrblck · September 6, 2021, 6:56am

The error is raised in nn.CrossEntropyLoss or nn.NLLLoss, if you pass a target tensor containing an invalid index. The offending target has a value of 17, which would indicate that your are working on a multi-class classification use case of at least 18 classes. If that’s correct, your model output should have the shape [batch_size, nb_classes=18] for a multi-class classification while the target would have the shape [batch_size] containing class indices in the range [0, 17].
Since this error is raised, your model output seems to have a wrong shape and I don’t know how the learning rate could chage this.

alber_mel · September 6, 2021, 11:23pm

Hi Piotr,

Thank you for your reply.

That’s true, I am working on a multitask case that has
[batch_size, nb_classes=26]

my model output is :
output shape torch.Size([8, 26])

I defined my classes range as:

p.random.seed(self.seed)
self.task_ids = [[0,1], [2,3], [4,5], [6,7], [8,9], [10,11], [12,13], [14,15], [16,17], [18,19], [20,21], [22,23], [24,25]]
self.train_set = {}
self.test_set = {}

This is my target model:
targets[index] 1 till targets[index] 26

how to proceed and check my model output shape if it is right or wrong,
I am still not sure why once I increase the number of tasks (not learning rate) it is failing, it might be just having a chance randomly to select more targets while on ntasks = 1 it is using fewer targets in an index less than 17 ?

ptrblck · September 7, 2021, 1:14am

The provided output shape of [8, 26] doesn’t match the raised error, since the class index 17 is already invalid.

Also this would indicate that you are working with 27 classes, since the target contains values in the range [0, nb_classes-1], so even if your model output is [8, 26] you will run into a new error pointing towards index 26 being out of range.