RuntimeError: mat1 dim 1 must match mat2 dim 0

Oscar_Rangel · August 6, 2020, 2:06pm

Hi there, cant figure out this error, help please.

model = densenet161(pretrained = True)
model.features[0] = nn.Conv2d(1, 96, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model_ft.classifier = nn.Linear(1024, 22)

I guess I am not doing something correctly, then when I begin training I get this:

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     89 
     90     def forward(self, input: Tensor) -> Tensor:
---> 91         return F.linear(input, self.weight, self.bias)
     92 
     93     def extra_repr(self) -> str:

~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1672     if input.dim() == 2 and bias is not None:
   1673         # fused op is marginally faster
-> 1674         ret = torch.addmm(bias, input, weight.t())
   1675     else:
   1676         output = input.matmul(weight.t())

RuntimeError: mat1 dim 1 must match mat2 dim 0

RaLo4 · August 6, 2020, 2:43pm

I am guessing you want to make densnet161 take greyscale images and predict 22 classes right?
Why does it say

and not model.classifier ? Do you have a second model called model_ft or is this a typo?

The original densenet161.classifier layer looked like this (classifier): Linear(in_features=2208, out_features=1000, bias=True).
It had 2208 in_features which you changed to 1024 (assuming model_ft is a typo or model_ft is also densenet161). Is this intentional?
You might have a size mismatch here.
You can probably debug your issue by printing your model using print(model) or alternatively print the passed tensors shapes on different locations using print(x.shape).

Oscar_Rangel · August 6, 2020, 3:08pm

Hi @RaLo4 ,

Thanks a lot for the answer, sometimes I get blinded looking at to much code and cant see the little things…

it works with 161

The thing is that I had several model sizes I was working on and i got mes up wit the features…

No is working !!!

Oscar_Rangel · August 6, 2020, 3:14pm

but when I do

model = pretrainedmodels.densenet201(pretrained='imagenet')
   )
    (norm5): BatchNorm2d(1920, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (last_linear): Linear(in_features=1920, out_features=1000, bias=True)
)

model.classifier = nn.Linear(1920, dls.c)
model.classifier
Linear(in_features=1920, out_features=22, bias=True)

is when I get the error, I like this pretrained models

Maybe you can shed some light what is that doing wrong.

Thanks a lot!

RaLo4 · August 6, 2020, 3:44pm

if I run

model = torchvision.models.densenet201(pretrained=True)

model.classifier = torch.nn.Linear(1920, 22, bias=True)

x = torch.ones((1, 3, 224, 224))

out = model(x)

everything goes well. It’s torchvisions densnet and not the one you linked, but that should not matter.
I am not sure if the problem is really on the forward pass. You should run your code in sections and check where exactly the error happens. Are you maybe doing other operations on your tensors before or after passing through your model?
Try to first identify where exactly the mismatch is and then try to look at the shapes of the tensors involved (using something like print(x.shape) maybe)

Oscar_Rangel · August 6, 2020, 4:30pm

@RaLo4 Thank you for taking the time to help,

######### Instanciate model ##################
m = pretrainedmodels.densenet201(pretrained='imagenet')

children = list(m.children())
head = nn.Sequential(nn.AdaptiveAvgPool2d(1), 
                    Flatten(), 
                    nn.Linear(children[-1].in_features, 22))
model = nn.Sequential(nn.Sequential(*children[:-2]), head)

but did not worked either.

let me dig a little bit more, keep you posted

Oscar_Rangel · August 6, 2020, 4:46pm

@RaLo4

Getting this error

RuntimeError: Error(s) in loading state_dict for DenseNet:
Missing key(s) in state_dict: “features.denseblock3.denselayer25.norm1.weight”, “features.denseblock3.denselayer25.norm1.bias”

neda_vida · October 9, 2020, 12:34pm

Hi,
i have a problem. My code is a cnn with pretrained vgg19. when i want to run its train i have a error of
RuntimeError: mat1 dim 1 must match mat2 dim 0

i tried what friends said here. but it didnt work. Could u please help me?
my cnn takes a RGB image 224224 and the first layer is 323*3.
Please help me.

ptrblck · October 11, 2020, 4:46am

Most likely the feature dimension of an intermediate activation doesn’t match the in_features of a linear layer.
Could you post your model definition and if possible an executable code snippet so that we could debug the issue?

neda_vida · October 12, 2020, 3:00pm

Thank you dear ptrblck. I found my big mistake. I didnt know how to use pretrained model, and i was trying to move all weights of pretrained model to my cnn. and it couldnt do it. i found from your froum that i just can change some layers like fc, dropout, … i could correct my mistake. Thank you for your reply.

neda_vida · October 14, 2020, 9:13am

Hi friends,
when i change fc layer in resnet50 in pictures fc1 i can run my code easily, but when i move this change in a class, it doesnt work in train. Could u please help me? fc1 fc1_print fc2 fc2_print

ptrblck · October 15, 2020, 3:34am

Could you post an executable code snippet by wrapping in into three backticks ```, which would make debugging easier, please?

neda_vida · October 15, 2020, 9:14am

i could solve it dear ptrblck. I should just add children(-1) to remove fc and then add new fc in class. Thank you very much for your reply.

num · October 20, 2020, 2:22pm

@ptrblck I have similar problem. Can you please help me to figure out. I have pasted my code here below.

import torch.nn as nn

class MLP1(nn.Module):

  def __init__(self):

    super(MLP1, self).__init__()

    # TODO: define your MLP1

    self.fc1 = nn.Linear(2048, 1024)

    self.fc2 = nn.Linear(1024, 512)

    self.output = nn.Linear(512, 10)  

    # Dropout for all hidden layers: 30% 

    self.drouput1 = nn.Dropout2d(p=0.3)

    self.drouput2 = nn.Dropout2d(p=0.3)

    # Activation Function and Softmax for Output layer

    self.relu = nn.ReLU()

    #self.softmax = nn.Softmax(dim=1)

  def forward(self, y):

    # TODO: define your forward function

    y = torch.flatten(y, start_dim = 1)

    y = self.drouput1(self.relu(self.fc1(y)))

    y = self.drouput2(self.relu(self.fc2(y)))

    return y

mlp1 = MLP1().to(device)  # operate on GPU

print(mlp1)

params = list(mlp1.parameters())

print(len(params))

print(params[0].size())

print(params[1].size())

print(params[2].size())

print(params[3].size())

print(params[4].size())

print(params[5].size())

-----------------------------------------------------------------------------------
MLP1(
  (fc1): Linear(in_features=2048, out_features=1024, bias=True)
  (fc2): Linear(in_features=1024, out_features=512, bias=True)
  (output): Linear(in_features=512, out_features=10, bias=True)
  (drouput1): Dropout2d(p=0.3, inplace=False)
  (drouput2): Dropout2d(p=0.3, inplace=False)
  (relu): ReLU()
)
6
torch.Size([1024, 2048])
torch.Size([1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([10, 512])
torch.Size([10]) 
--------------------------------------------------------------------------------

correct1 = 0

total1 = 0

with torch.no_grad():

  for data in cifar_testloader:

    # TODO: write testing code

    images, labels = data

    images = images.to(device)

    labels = labels.to(device)

    output1 = mlp1(images)

    _, predicted1 = torch.max(output1.data, 1)

    total1 += labels.size(0)

    correct1 += (predicted1 == labels).sum().item()

print('Accuracy of the MLnetwork on the 10000 test images: %d %%' % (100 * correct1 / total1))

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-64656ef79eea> in <module>()
      8     labels = labels.to(device)
      9 
---> 10     output1 = mlp1(images)
     11     _, predicted1 = torch.max(output1.data, 1)
     12     total1 += labels.size(0)

4 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1672     if input.dim() == 2 and bias is not None:
   1673         # fused op is marginally faster
-> 1674         ret = torch.addmm(bias, input, weight.t())
   1675     else:
   1676         output = input.matmul(weight.t())

RuntimeError: mat1 dim 1 must match mat2 dim 0
----------------------------------------------------------------------------------

ptrblck · October 20, 2020, 7:07pm

Could you check the shape of your input tensor?
Your model is working fine for the expected input shape of [batch_size, 2048]:

mlp1 = MLP1()
x = torch.randn(8, 2048)
out = mlp1(x)
print(out.shape)
> torch.Size([8, 512])

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier. I’ve formatted your code so that I could copy-paste it.

yAya-yns · November 18, 2020, 5:12pm

Thank you! That solved my issue!

Xiaolin_Li · January 14, 2021, 8:04am

I also have a error like this:

Test_Quantization Epoch #0:   0% 0/129 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/drive/My Drive/ECG_quantization/main.py", line 89, in <module>
    trainer.testQuant(loader_test, loss_fn, info='Test_Quantization ',quant=True, stats=stats)
  File "/content/drive/My Drive/ECG_quantization/trainer.py", line 202, in testQuant
    preds = quantForward(self.model, data, stats)
  File "/content/drive/My Drive/ECG_quantization/quantized.py", line 192, in quantForward
    x, scale_next, zero_point_next = quantizeLayer(x, model.fc1, stats['fc2'], scale_next, zero_point_next)
  File "/content/drive/My Drive/ECG_quantization/quantized.py", line 79, in quantizeLayer
    x = (layer(X)/ scale_next) + zero_point_next 
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

when I run this

stats = gatherStats(model, loader_test)
trainer.testQuant(loader_test, loss_fn, info='Test_Quantization ',quant=True, stats=stats)

And here are my functions:

def quantForward(model, x, stats):
  
  # Quantise before inputting into incoming layers
  x = quantize_tensor(x, min_val=stats['conv1']['min'], max_val=stats['conv1']['max'])
  # print(type(x))

  x, scale_next, zero_point_next = quantizeLayer(x.tensor, model.conv1, stats['conv2'], x.scale, x.zero_point)

  x = model.bn1(x)

  x = F.max_pool1d(x, 2, stride=3)

  x, scale_next, zero_point_next = quantizeLayer(x, model.conv2, stats['conv3'], scale_next, zero_point_next)

  x = model.dropout(x)

  x = model.bn2(x)

  x = F.max_pool1d(x, 2, stride=2)
  
  x, scale_next, zero_point_next = quantizeLayer(x, model.conv3, stats['fc1'], scale_next, zero_point_next)

  x = model.dropout(x)

  x, scale_next, zero_point_next = quantizeLayer(x, model.fc1, stats['fc2'], scale_next, zero_point_next)

  x = model.faltten(x)

  x = model.dropout(x)

  # Back to dequant for final layer
  x = dequantize_tensor(QTensor(tensor=x, scale=scale_next, zero_point=zero_point_next))
   
  x = model.fc2(x)

  return F.log_softmax(x, dim=1)

Could you please help me have a look at it?

ptrblck · January 14, 2021, 8:25am

Based on the error message the feature dimension of the input doesn’t match the num_features from a linear layer. I don’t know which linear layer it is, but you could add print statements into the forward method and make sure that the activations have the right shape.

Xiaolin_Li · January 14, 2021, 5:11pm

Thank you! I figure it out.

Xiaolin_Li · January 14, 2021, 5:11pm

I am doing CNN quantization into different bits, like 16,8,4,2 bits.
Here are the functions I used in the quantization process.

from collections import namedtuple
import torch
import torch.nn as nn
import torch.nn.functional as F


QTensor = namedtuple('QTensor', ['tensor', 'scale', 'zero_point'])

## Quantisation Functions
def calcScaleZeroPoint(min_val, max_val,num_bits=16):
  # Calc Scale and zero point of next 
  qmin = 0.
  qmax = 2.**num_bits - 1.

  scale = (max_val - min_val) / (qmax - qmin)

  initial_zero_point = qmin - min_val / scale
  
  zero_point = 0
  if initial_zero_point < qmin:
      zero_point = qmin
  elif initial_zero_point > qmax:
      zero_point = qmax
  else:
      zero_point = initial_zero_point

  zero_point = int(zero_point)

  return scale, zero_point

def quantize_tensor(x, num_bits=16, min_val=None, max_val=None):
    
    if not min_val and not max_val: 
      min_val, max_val = x.min(), x.max()

    qmin = 0.
    qmax = 2.**num_bits - 1.

    scale, zero_point = calcScaleZeroPoint(min_val, max_val, num_bits)
    q_x = zero_point + x / scale
    q_x.clamp_(qmin, qmax).round_()
    q_x = q_x.round().byte()
    
    return QTensor(tensor=q_x, scale=scale, zero_point=zero_point)

def dequantize_tensor(q_x):
    return q_x.scale * (q_x.tensor.float() - q_x.zero_point)


## Rework Forward pass of Linear and Conv Layers to support Quantisation
def quantizeLayer(x, layer, stat, scale_x, zp_x):
  # for both conv and linear layers

  # cache old values
  W = layer.weight.data
  B = layer.bias.data

  # quantise weights, activations are already quantised
  w = quantize_tensor(layer.weight.data) 
  b = quantize_tensor(layer.bias.data)

  layer.weight.data = w.tensor.float()
  layer.bias.data = b.tensor.float()

  # This is Quantisation Artihmetic
  scale_w = w.scale
  zp_w = w.zero_point
  scale_b = b.scale
  zp_b = b.zero_point
  
  scale_next, zero_point_next = calcScaleZeroPoint(min_val=stat['min'], max_val=stat['max'])

  # Preparing input by shifting
  X = x.float() - zp_x
  layer.weight.data = scale_x * scale_w*(layer.weight.data - zp_w)
  layer.bias.data = scale_b*(layer.bias.data + zp_b)

  # All int computation
  x = (layer(X)/ scale_next) + zero_point_next 
  
  # Perform relu too
  x = F.relu(x)

  # Reset weights for next forward pass
  layer.weight.data = W
  layer.bias.data = B
  
  return x, scale_next, zero_point_next

## Get Max and Min Stats for Quantising Activations of Network.
# This is done by running the network with around 1000 examples and getting the 
# average min and max activation values before and after each layer.
# Get Min and max of x tensor, and store it
def updateStats(x, stats, key):
  max_val, _ = torch.max(x, dim=1)
  min_val, _ = torch.min(x, dim=1)
  
  
  if key not in stats:
    stats[key] = {"max": max_val.sum(), "min": min_val.sum(), "total": 1}
  else:
    stats[key]['max'] += max_val.sum().item()
    stats[key]['min'] += min_val.sum().item()
    stats[key]['total'] += 1
  
  return stats

# Reworked Forward Pass to access activation Stats through updateStats function
def gatherActivationStats(model, x, stats):

  stats = updateStats(x.clone().view(x.shape[0], -1), stats, 'conv1')
  
  x = F.relu(model.conv1(x))

  x = model.bn1(x)

  x = F.max_pool1d(x, 2, stride=3)

  stats = updateStats(x.clone().view(x.shape[0], -1), stats, 'conv2')
  
  x = model.dropout(x)

  x = F.relu(model.conv2(x))

  x = model.bn2(x)

  x = F.max_pool1d(x, 2, stride=2)

  stats = updateStats(x.clone().view(x.shape[0], -1), stats, 'conv3')
  
  x = model.dropout(x)

  x = F.relu(model.conv3(x))
  
  stats = updateStats(x, stats, 'fc1')

  x = model.flatten(x)

  x = model.dropout(x)

  x = F.relu(model.fc1(x))
  
  stats = updateStats(x, stats, 'fc2')

  x = model.fc2(x)

  return stats

# Entry function to get stats of all functions.
def gatherStats(model, test_loader):
    device = 'cuda'
    
    model.eval()
    test_loss = 0
    correct = 0
    stats = {}
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            stats = gatherActivationStats(model, data, stats)
    
    final_stats = {}
    for key, value in stats.items():
      final_stats[key] = { "max" : value["max"] / value["total"], "min" : value["min"] / value["total"] }
    return final_stats

## Forward Pass for Quantised Inference
def quantForward(model, x, stats):
  
  # Quantise before inputting into incoming layers
  x = quantize_tensor(x, min_val=stats['conv1']['min'], max_val=stats['conv1']['max'])

  x, scale_next, zero_point_next = quantizeLayer(x.tensor, model.conv1, stats['conv2'], x.scale, x.zero_point)

  x = model.bn1(x)

  x = F.max_pool1d(x, 2, stride=3)

  x, scale_next, zero_point_next = quantizeLayer(x, model.conv2, stats['conv3'], scale_next, zero_point_next)

  x = model.dropout(x)

  x = model.bn2(x)

  x = F.max_pool1d(x, 2, stride=2)
  
  x, scale_next, zero_point_next = quantizeLayer(x, model.conv3, stats['fc1'], scale_next, zero_point_next)
  
  x = model.dropout(x)

  x = x.view(-1, 32)

  x, scale_next, zero_point_next = quantizeLayer(x, model.fc1, stats['fc2'], scale_next, zero_point_next)

  x = model.flatten(x)

  x = model.dropout(x)

  # Back to dequant for final layer
  x = dequantize_tensor(QTensor(tensor=x, scale=scale_next, zero_point=zero_point_next))
   
  x = model.fc2(x)

  return F.log_softmax(x, dim=1)

My test function is as follow

    def testQuant(self, dataloader, loss_fn, epoch=-1, info='', quant=False, stats=None):
        self.model.eval()
        nb_classes = 5
        confusion_matrix = torch.zeros(nb_classes, nb_classes)
        Sen = torch.zeros(1, nb_classes)
        pre = torch.zeros(1, nb_classes)
        Spe = torch.zeros(1, nb_classes)
        F1 = torch.zeros(1, nb_classes)
        desc = f'{info}Epoch #{epoch + 1}'
        with torch.no_grad():
            with tqdm(total=len(dataloader), desc=desc) as progress_bar:
                info_avg = {}
                for batch_idx, (data, labels) in enumerate(dataloader):
                    if quant:
                        preds = quantForward(self.model, data, stats)
                    else:
                        preds = self.model(data)
                    loss = loss_fn(preds, labels)
                    info_show = self._update_info(preds, labels, loss, info_avg)
                    progress_bar.set_postfix(**info_show)
                    progress_bar.update(1)
                    _, preds = torch.max(preds, 1)
                    for t, p in zip(labels.view(-1), preds.view(-1)):
                        confusion_matrix[t.long(), p.long()] += 1
        FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix) 
        FN = confusion_matrix.sum(axis=1)  - np.diag(confusion_matrix)
        TP = np.diag(confusion_matrix)
        TN = confusion_matrix.sum() - (FP + FN + TP)
        
        FP = torch.Tensor(FP)
        FN = torch.Tensor(FN)
        TP = torch.Tensor(TP)
        TN = torch.Tensor(TN)

        # Sensitivity, hit rate, recall, or true positive rate
        TPR = TP/(TP+FN)
        # Specificity or true negative rate
        TNR = TN/(TN+FP) 
        # Precision or positive predictive value
        PPV = TP/(TP+FP)
        # # Negative predictive value
        # NPV = TN/(TN+FN)
        # # Fall out or false positive rate
        # FPR = FP/(FP+TN)
        # # False negative rate
        # FNR = FN/(TP+FN)
        # # False discovery rate
        # FDR = FP/(TP+FP)
        F1 = 2 * (TPR * PPV) / (TPR+PPV)

        # Overall accuracy
        ACC = (TP+TN)/(TP+FP+FN+TN)
        # print('Sensitivity for each class',confusion_matrix.diag()/confusion_matrix.sum(1))
        print('Acc',ACC)
        print('Sensitivity',TPR)
        print('Specificity',TNR)
        print('Precision',PPV)
        print('F1 score',F1)
        plt.figure(figsize=(10,10))
        plot_confusion_matrix(confusion_matrix.numpy(), ['N','SVEB','VEB','F','Q'])
        return info_show

The problem is I get many ‘nan’ in my metrics, when I run

stats = gatherStats(model, loader_test)
trainer.testQuant(loader_test, loss_fn, info='Test_Quantization ',quant=True, stats=stats)

I get the following results

!python "/content/drive/My Drive/ECG_quantization/main.py"
Test_Quantization Epoch #0: 100% 129/129 [00:02<00:00, 58.98it/s, Acc=0.8231, F1 score=nan, Loss=0.8739, Sensitivity(recall)=0.0000, Specificity=1.0000, precision=nan]
/content/drive/My Drive/ECG_quantization/trainer.py:219: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  TP = torch.Tensor(TP)
Acc tensor([0.8257, 0.9743, 0.9350, 0.9924, 0.9240])
Sensitivity tensor([1., 0., 0., 0., 0.])
Specificity tensor([0., 1., 1., 1., 1.])
Precision tensor([0.8257,    nan,    nan,    nan,    nan])
F1 score tensor([0.9046,    nan,    nan,    nan,    nan])
Normalized confusion matrix

But when I run

trainer.testQuant(loader_test, loss_fn, info='Test_Quantization ',quant=False)

I can get correct results without quantization.
I want to ask what’s wrong with my quantization code?