RuntimeError: size mismatch, m1: [8 x 73728], m2: [512 x 227] at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorMathBlas.cu:290

Yaqi_CAI · April 7, 2020, 6:59am

I am using re-resnet50 to do multi-label annottaion and the dataset is food101
But when I was running

model = senet.SENet50()

num_ftrs = model.linear.in_features # revising the categories classes
model.linear= nn.Linear(num_ftrs, len(targets))

ct = 0
for name, child in model.named_children():
ct += 1
if ct < 8:
for name2, params in child.named_parameters():
params.requires_grad = False
output = model(img_data)

It came out the problem:

Here is the model:

Can anyone tell me how to fix the problem?

Yaqi_CAI · April 7, 2020, 7:20am

Besides, the RE-resnet original coda is here:https://github.com/kuangliu/pytorch-cifar/blob/master/models/senet.py

ptrblck · April 7, 2020, 8:24am

The code works fine for me:

model =  SENet(PreActBlock, [2,2,2,2])

num_ftrs = model.linear.in_features # revising the categories classes
model.linear= nn.Linear(num_ftrs, 10)

out = model(torch.randn(1, 3, 32, 32))
print(out.shape)
> torch.Size([1, 10])

PS: It’s always better to post code by wrapping it into three backticks ```, as it’s easier to debug and also the pictures are really hard to read.

Yaqi_CAI · April 8, 2020, 8:44am

Well, thanks for your reminding.
But when I change the codes, the problem still exists：

RuntimeError Traceback (most recent call last)
in
25 img_data, target = img_data.to(device), target.to(device)
26
—> 27 output = model(img_data) #FWD prop
28
29 loss = criterion(output, target) #Cross entropy loss

D:\pythonana\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
→ 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)

C:\yaqi_anaconda\senet.py in forward(self, x)
102 out = F.avg_pool2d(out, 4)
103 out = out.view(out.size(0), -1)
→ 104 out = self.linear(out)
105 return out
106

D:\pythonana\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
→ 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)

D:\pythonana\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
85
86 def forward(self, input):
—> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):

D:\pythonana\envs\pytorch\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
1368 if input.dim() == 2 and bias is not None:
1369 # fused op is marginally faster
→ 1370 ret = torch.addmm(bias, input, weight.t())
1371 else:
1372 output = input.matmul(weight.t())

RuntimeError: size mismatch, m1: [8 x 73728], m2: [512 x 10] at C:/w/1/s/windows/pytorch/aten/src\THC/generic/THCTensorMathBlas.cu:290

ptrblck · April 8, 2020, 8:45am

Are you getting the error with my code snippet?
If not, could you compare your current script with my examples and post the differences, so that we can have another look, please?

Yaqi_CAI · April 8, 2020, 8:48am

Yes, i just get the error with your code snippet.
And here is the model now:

SENet(
(conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer1): Sequential(
(0): PreActBlock(
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(fc1): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
)
(1): PreActBlock(
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(fc1): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
)
)
(layer2): Sequential(
(0): PreActBlock(
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(shortcut): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
)
(fc1): Conv2d(128, 8, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(8, 128, kernel_size=(1, 1), stride=(1, 1))
)
(1): PreActBlock(
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(fc1): Conv2d(128, 8, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(8, 128, kernel_size=(1, 1), stride=(1, 1))
)
)
(layer3): Sequential(
(0): PreActBlock(
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(shortcut): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
)
(fc1): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(16, 256, kernel_size=(1, 1), stride=(1, 1))
)
(1): PreActBlock(
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(fc1): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(16, 256, kernel_size=(1, 1), stride=(1, 1))
)
)
(layer4): Sequential(
(0): PreActBlock(
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(shortcut): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
)
(fc1): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(32, 512, kernel_size=(1, 1), stride=(1, 1))
)
(1): PreActBlock(
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(fc1): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(fc2): Conv2d(32, 512, kernel_size=(1, 1), stride=(1, 1))
)
)
(linear): Linear(in_features=512, out_features=10, bias=True)
)

Yaqi_CAI · April 8, 2020, 8:55am

Here is my codes before:

model = senet.SENet50()

num_ftrs = model.linear.in_features # revising the categories classes
model.linear= nn.Linear(num_ftrs, len(targets))

ct = 0
for name, child in model.named_children():
ct += 1
if ct < 8:
for name2, params in child.named_parameters():
params.requires_grad = False
output = model(img_data)

And it came out the problem :RuntimeError: size mismatch, m1: [8 x 73728], m2: [512 x227]

But now after i change your codes:

from senet import PreActBlock
model = SENet(PreActBlock, [2,2,2,2])

num_ftrs = model.linear.in_features # revising the categories classes
model.linear= nn.Linear(num_ftrs, 10)

out = model(torch.randn(1, 3, 32, 32))

ct = 0
for name, child in model.named_children(): #frozen 7 layers
ct += 1
if ct < 8:
for name2, params in child.named_parameters():
params.requires_grad = False

for img_data, target in tqdm_notebook(train_loader, desc='Training'):    
    img_data, target = img_data.to(device), target.to(device)
    
    output = model(img_data) #FWD prop

it came out the same problem as I showed in the last reply.
RuntimeError: size mismatch, m1: [8 x 73728], m2: [512 x 10]

Plus: my dataset is food101 and it has 101 classes.

ptrblck · April 9, 2020, 4:58am

Sorry, I cannot reproduce this error even with the parameter freezing.

Yaqi_CAI · April 9, 2020, 5:51am

OK. Thank you. Maybe I should change another method to optimize my result.