RuntimeError: mat1 and mat2 shapes cannot be multiplied (12x22 and 264x128)

Hello, I am working on a 1D CNN multi-classification project. I am using Optuna to implement tuning for various hyperparameters, including the number of conv layers, kernel and stride sizes, pooling, channels, etc. As a result, the final output size from the convolutional layers is always different. As I am required to specify the input size of the first fully connected layer based on the output from the conv layers, I am calculating the output of each conv/pooling layer, using the following formula:

image

Then using the final output as the input to the fully connected layers However, I am getting the error stated in the title. I am assuming that “mat2” refers to the size of the calculated input, and “mat1” refers to the flattened vector, specified in the forward path using:

x = x.view(x.size(0),-1)

Note that x should be equal to the calculated size, but this does not seem to be the case. Also note that the multiplication of the first mat (12x22) yields the result of the 2nd mat (264), while the 128 represents the number of neurons in the first FC layer. I am implementing the same thing with 2D CNN, but I am not getting the same error. The following is my forward path:

    def forward(self,x):
        x = self.conv(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
        return x

I feel that the error might be in x = x.view(x.size(0),-1). Is this true? If not, where am I going wrong?

Your code to flatten the activation is correct, however

will break since the in_features of the first linear layer are defined once for a specific shape and won’t accept a different number of features.
A common approach would be to use adaptive pooling layers and to specify the desired output shape instead. torchvision models use this approach to allow different input shapes (assuming they are not too small).
Here is a small example:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1, 16, 3, 1, 1)
        self.lin = nn.Linear(16 * 24 * 24, 10)
        
    def forward(self, x):
        out = self.conv(x)
        out = out.view(x.size(0), -1)
        out = self.lin(out)
        return out


model = MyModel()

# works
x = torch.randn(2, 1, 24, 24)        
out = model(x)
print(out.shape)
# torch.Size([2, 10])

# breaks
x = torch.randn(2, 1, 48, 48)  
out = model(x)
# RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x36864 and 9216x10)


# use adaptive pooling layer
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1, 16, 3, 1, 1)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.lin = nn.Linear(16, 10)
        
    def forward(self, x):
        out = self.conv(x)
        out = self.avgpool(out)
        out = out.view(x.size(0), -1)
        out = self.lin(out)
        return out

model = MyModel()

# works
x = torch.randn(2, 1, 24, 24)        
out = model(x)
print(out.shape)
# torch.Size([2, 10])

x = torch.randn(2, 1, 48, 48)  
out = model(x)
print(out.shape)
# torch.Size([2, 10])

Thank you for your reply.

Actually, I have tried implementing this for 2D CNN and it works. The code automatically adjusts the input size of the first FC Layer based on the calculated output from the convolutional layers (shown in the aforementioned equation). I define it as follows:

        in_size = in_size*W
        fclayers = []
        for fc in range(params['No. of FC Layers']):
            fclayers.append(nn.Linear(in_size,params['Neurons'][fc]))
            fclayers.append(nn.ReLU())
            in_size = params['Neurons'][fc]

where W*in_size represents the calculated output from the convolutional layers multiplied by the number of channels of the last layer. It is then fed to the first FC layer through

fclayers.append(nn.Linear(in_size,params['Neurons'][fc]))

Again, this works absolutely fine for 2D CNN, but does not seem to apply for 1D CNN.

Does this mean that it is only applied for the last layer? i.e. for Conv-Pool-Conv-Pool, adaptive pooling is applied only to the 2nd pool? But even then, my hyperparamaters search space includes pooling layers, so at some trials pooling is not applied after each conv layer. Therefore, I am trying to make a flexible code that creates the FC layer based solely on the output from the conv layers.

I don’t know how it could work for different input shapes as it seems you would be using a newly initialized linear layer for each new shape?
While it might work, you would need to add this newly added linear layer to the optimizer to train it and would also increase the parameter size of your model for each new shape.
Is this really your use case?

Yes, exactly this is is what I’m doing. The input size to the FC layers is changing at each iteration and is defined on spot depending on the output from the conv layers. It works fine for 2D CNN but fails to operate correctly for 1D CNN.

I don’t quite understand what you mean here. However, the hyperparameters that are used for optimization indeed include the number of FC layers, convolutional layers, kernel and stride sizes, etc.

I might have misunderstood your use case and it seems you are not trying to add new layers for each input shape but each input shape defines a completely new use case and model?
If so, then you should be able to reuse the same approach that was working with Conv2d. Could you share the error you are seeing when Conv1d is used?

Yes, this is it exactly. Here is the complete error I am getting, where it includes the hyperparameter optimization part.

RuntimeError                              Traceback (most recent call last)
Cell In[33], line 4
      2 Rec = record()
      3 study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(), pruner=optuna.pruners.MedianPruner())
----> 4 study.optimize(objective, n_trials=150)
      6 best_trial = study.best_trial
      8 optuna.visualization.matplotlib.plot_param_importances(study)

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\optuna\study\study.py:400, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    392 if n_jobs != 1:
    393     warnings.warn(
    394         "`n_jobs` argument has been deprecated in v2.7.0. "
    395         "This feature will be removed in v4.0.0. "
    396         "See https://github.com/optuna/optuna/releases/tag/v2.7.0.",
    397         FutureWarning,
    398     )
--> 400 _optimize(
    401     study=self,
    402     func=func,
    403     n_trials=n_trials,
    404     timeout=timeout,
    405     n_jobs=n_jobs,
    406     catch=catch,
    407     callbacks=callbacks,
    408     gc_after_trial=gc_after_trial,
    409     show_progress_bar=show_progress_bar,
    410 )

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\optuna\study\_optimize.py:66, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64 try:
     65     if n_jobs == 1:
---> 66         _optimize_sequential(
     67             study,
     68             func,
     69             n_trials,
     70             timeout,
     71             catch,
     72             callbacks,
     73             gc_after_trial,
     74             reseed_sampler_rng=False,
     75             time_start=None,
     76             progress_bar=progress_bar,
     77         )
     78     else:
     79         if show_progress_bar:

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\optuna\study\_optimize.py:163, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    160         break
    162 try:
--> 163     trial = _run_trial(study, func, catch)
    164 except Exception:
    165     raise

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\optuna\study\_optimize.py:264, in _run_trial(study, func, catch)
    261         assert False, "Should not reach."
    263 if state == TrialState.FAIL and func_err is not None and not isinstance(func_err, catch):
--> 264     raise func_err
    265 return trial

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\optuna\study\_optimize.py:213, in _run_trial(study, func, catch)
    210     thread.start()
    212 try:
--> 213     value_or_values = func(trial)
    214 except exceptions.TrialPruned as e:
    215     # TODO(mamu): Handle multi-objective cases.
    216     state = TrialState.PRUNED

Cell In[30], line 32, in objective(trial)
     29     model = ConvNet2D(params)
     30     model.to(device)
---> 32 accuracy,metric,Confusion = KFold(params,model,num_epochs,device,trial)
     34 Rec.update(accuracy,metric,Confusion)
     36 return accuracy

Cell In[29], line 23, in KFold(params, model, num_epochs, device, trial)
     21 while epoch<num_epochs and not done:
     22     epoch+=1
---> 23     train_loss,train_correct = train_epoch(model,train_loader,criterion,optimizer,device)
     24     test_loss,test_correct, Confusion = test_epoch(model,test_loader,criterion,device)
     26     train_loss = train_loss/len(train_loader.sampler)

Cell In[23], line 13, in train_epoch(model, dataloader, loss_fn, optimizer, device)
     10 features,labels = features.to(device),labels.to(device)
     12 #Forward Pass
---> 13 output=model(features)
     14 loss=loss_fn(output,labels)
     16 #Backward Pass

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\torch\nn\modules\module.py:1480, in Module._call_impl(self, *args, **kwargs)
   1475 # If we don't have any hooks, we want to skip the rest of the logic in
   1476 # this function, and just call forward.
   1477 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1478         or _global_backward_pre_hooks or _global_backward_hooks
   1479         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1480     return forward_call(*args, **kwargs)
   1481 # Do not call functions when jit is used
   1482 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[22], line 53, in ConvNet1D.forward(self, x)
     51         x = x.view(x.size(0),-1)
     52 #         print(x.size(0))
---> 53         x = self.fc(x)
     54         return x

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\torch\nn\modules\module.py:1480, in Module._call_impl(self, *args, **kwargs)
   1475 # If we don't have any hooks, we want to skip the rest of the logic in
   1476 # this function, and just call forward.
   1477 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1478         or _global_backward_pre_hooks or _global_backward_hooks
   1479         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1480     return forward_call(*args, **kwargs)
   1481 # Do not call functions when jit is used
   1482 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\torch\nn\modules\container.py:204, in Sequential.forward(self, input)
    202 def forward(self, input):
    203     for module in self:
--> 204         input = module(input)
    205     return input

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\torch\nn\modules\module.py:1480, in Module._call_impl(self, *args, **kwargs)
   1475 # If we don't have any hooks, we want to skip the rest of the logic in
   1476 # this function, and just call forward.
   1477 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1478         or _global_backward_pre_hooks or _global_backward_hooks
   1479         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1480     return forward_call(*args, **kwargs)
   1481 # Do not call functions when jit is used
   1482 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Anaconda\Installation\envs\FYP\lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (9x14 and 126x4)

Thanks for the stacktrace! Could you also post the logic used to create the model and the difference between the Conv2d and Conv1d use case? The forward method looks alright (at least the part shown in the stacktrace) as you are properly flattening the activation before passing it to the linear layer.

Sure! Here it is:

class ConvNet1D(nn.Module):
    def __init__(self, params, num_classes=5):
        super(ConvNet1D,self).__init__()
        
        in_size = 1
        convlayers = []
        out_size = [6,9,12]
        W=119 ---> original sample size (1x119)
        for conv in range(params['No. of Conv Layers']):
            convlayers.append(nn.Conv1d(in_channels=in_size, 
                                        out_channels=out_size[conv], 
                                        kernel_size=params['Kernel Size'][conv],
                                        stride=params['Stride Size'][conv],
                                        padding=params['Padding']))
            
            # Output Dimension Calculation
            F = params['Kernel Size'][conv]
            S = params['Stride Size'][conv]
            P = params['Padding']
            W = int(((W-F+2*P)/S)+1) #Output of Convolution Layer
            
            if params['Pooling'][conv] == 'Yes':
                convlayers.append(nn.MaxPool1d(kernel_size=2))
                W = int(W/2) #Output of Pooling Layer
        
            convlayers.append(nn.ReLU())
            
            in_size = out_size[conv]
        
        convlayers.append(nn.Dropout(p=0.4))
        
        self.conv = nn.Sequential(*convlayers)
        
        in_size = in_size*W
        fclayers = []
        for fc in range(params['No. of FC Layers']):
            fclayers.append(nn.Linear(in_size,params['Neurons'][fc]))
            fclayers.append(nn.ReLU())
            fclayers.append(nn.Dropout(p=0.5))
            in_size = params['Neurons'][fc]
            
        fclayers.append(nn.Linear(in_size,5))
        self.fc = nn.Sequential(*fclayers)
        
    def forward(self,x):
        x = self.conv(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
        return x

What it does is that it basically takes in the hyperparameters set by the optimizer (all the “params” arguments) and builds the model accordingly. It also calculates the output size (W) after each layer is applied, according to the following equation:

image

The only difference between the 2D CNN class and the 1D CNN is highlighted below, namely in the input channel size, the input size to the FC layers, and using Conv1d or Conv2d:

For 1D CNN:
in_size = 1
convlayers.append(nn.*Conv1d*(in_channels=in_size, 
                                        out_channels=out_size[conv], 
                                        kernel_size=params['Kernel Size'][conv],
                                        stride=params['Stride Size'][conv],
                                        padding=params['Padding']))
convlayers.append(nn.*MaxPool1d*(kernel_size=2))
.
.
.
in_size = in_size*W ----> Before feeding the output to the FC Layers

For 2D CNN:
in_size = 3
convlayers.append(nn.*Conv2d*(in_channels=in_size, 
                                        out_channels=out_size[conv], 
                                        kernel_size=params['Kernel Size'][conv],
                                        stride=params['Stride Size'][conv],
                                        padding=params['Padding']))
convlayers.append(nn.*MaxPool2d*(kernel_size=2))
.
.
.
in_size = in_size*W*W  ----> Before feeding the output to the FC Layers

Thanks for the code. I don’t know which params you are using but using some standard attributes such as kernel_size=3, stride=1, padding=1 for conv layers and using 3 conv/linear blocks works for me:

class ConvNet1D(nn.Module):
    def __init__(self, num_classes=5):
        super(ConvNet1D,self).__init__()
        
        in_size = 1
        convlayers = []
        out_size = [6,9,12]
        W=119 
        for conv in range(3):
            convlayers.append(nn.Conv1d(in_channels=in_size, 
                                        out_channels=out_size[conv], 
                                        kernel_size=3,#params['Kernel Size'][conv],
                                        stride=1,#params['Stride Size'][conv],
                                        padding=1))#params['Padding']))
            
            # Output Dimension Calculation
            F = 3#params['Kernel Size'][conv]
            S = 1#params['Stride Size'][conv]
            P = 1#params['Padding']
            W = int(((W-F+2*P)/S)+1) #Output of Convolution Layer
            
            if True:
                convlayers.append(nn.MaxPool1d(kernel_size=2))
                W = int(W/2) #Output of Pooling Layer
        
            convlayers.append(nn.ReLU())
            
            in_size = out_size[conv]
        
        convlayers.append(nn.Dropout(p=0.4))
        
        self.conv = nn.Sequential(*convlayers)
        
        in_size = in_size*W
        fclayers = []
        for fc in range(3):
            fclayers.append(nn.Linear(in_size,10))
            fclayers.append(nn.ReLU())
            fclayers.append(nn.Dropout(p=0.5))
            in_size = 10
            
        fclayers.append(nn.Linear(in_size,5))
        self.fc = nn.Sequential(*fclayers)
        
    def forward(self,x):
        x = self.conv(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
        return x
    
model = ConvNet1D()
x = torch.randn(2, 1, 119)
out = model(x)
print(out.shape)
# torch.Size([2, 5])