Flattening pre-post first fully-connection layer

Hi,

I study image classification model, and try to make mown model. All layers in my model seems to have matched size between pre/post layer-connections.
Bellow is my summary of model;


    Layer (type)               Output Shape         Param #

================================================================

        Conv2d-1          [-1, 9, 110, 110]             684
     AvgPool2d-2          [-1, 9, 106, 106]               0
        Conv2d-3            [-1, 3, 52, 52]             246
     AvgPool2d-4            [-1, 3, 50, 50]               0
        Conv2d-5            [-1, 3, 25, 25]              12
     AvgPool2d-6            [-1, 3, 23, 23]               0
        Linear-7                  [-1, 529]         840,052
        Linear-8                   [-1, 59]          31,270
        Linear-9                    [-1, 5]             300

================================================================

Total params: 872,564
Trainable params: 872,564
Non-trainable params: 0


Input size (MB): 0.57
Forward/backward pass size (MB): 1.75
Params size (MB): 3.33
Estimated Total Size (MB): 5.66


And batch size is 32 in now, then run training met an error at Linear-7;

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x19950 and 1587x529)

The mat2 is the (3x23x23=1587)x529, PyTorch generates parameters correctly. I use torch.flatten(start_dim=1), my understanding is that start_dim=0 is about a batch, so correctly flattening is done.

Does anyone know how “19950” comes from ?
My understanding is that comes from previous layer, and multiply-added by mat1, mat2, and biases, previous layer is;

nn.AvgPool2d(kernel_size=(3, 3), stride=1)


IAMAl

It seems the number of features in the incoming activation tensor doesn’t match the expected in_features in the linear layer.
Could you post the code for the model definition as well as the input shapes?

I use output size calculator;

def Out_Size(kernel_size=1, in_size=1, padding_size=0, stride_factor=1, dilation_factor=1):
    Length = int((in_size + 2*padding_size - dilation_factor*(kernel_size - 1) - 1)/stride_factor) + 1

    return Length

Architectural parameters are;

##Layer Block-3
#2D Convolution-3
#Kernel Size
CV3_K_H         = 1
CV3_K_W         = 1
#Number of Channels
CV3_K_C         = 3
#Stride Factor
CV3_STRIDE      = 2
CV3_DILATION    = 1
CV3_BIAS        = True
#Output Size (No-Padding)
CV3_H           = Out_Size(kernel_size=CV3_K_H, in_size=PL2_H, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)
CV3_W           = Out_Size(kernel_size=CV3_K_W, in_size=PL2_W, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)

#2D Pooling-3
#Kernel Size
PL3_K_H         = 3
PL3_K_W         = 3
#Stride Factor
PL3_STRIDE      = 1
#Output Size (No-Padding)
PL3_H           = Out_Size(kernel_size=PL3_K_H, in_size=CV3_H, padding_size=0, stride_factor=PL3_STRIDE)
PL3_W           = Out_Size(kernel_size=PL3_K_W, in_size=CV3_W, padding_size=0, stride_factor=PL3_STRIDE)


##Layer Block-4
#Fully-Connection-1
#Input Size
FC1_SIZE_IN     = (PL3_H * PL3_W * CV3_K_C)
#Output Size
FC1_SIZE_OUT    = (PL3_H * PL3_W)
#Bias
FC1_BIAS        = True

A part of my model making the error is;

        #Layer Block-3
        self.cv3    = nn.Conv2d(kernel_size=(CV3_K_H, CV3_K_W),in_channels=CV3_K_C, out_channels=CV3_K_C, stride=CV3_STRIDE, dilation=CV3_DILATION, bias=CV3_BIAS)
        self.pl3    = nn.AvgPool2d(kernel_size=(PL3_K_H, PL3_K_W), stride=PL3_STRIDE)

        #Layer Block-4
        self.fc1    = nn.Linear(in_features=FC1_SIZE_IN, out_features=FC1_SIZE_OUT, bias=FC1_BIAS)

@ptrblck
I paste a part of code.

Unfortunately, you haven’t posted the model definition so that I could debug it.
Since the number of features create a mismatch I guess the feature calculation or its usage might be wrong.

Network Configuration

#Input Image SIze
IMG_H           = 224
IMG_W           = 224

##Layer Block-1
#2D Convolution-1
#Kernel Size
CV1_K_H         = 5
CV1_K_W         = 5
#Number of Channels
CV1_K_C         = 3
#Stride Factor
CV1_STRIDE      = 2
CV1_DILATION    = 1
CV1_BIAS        = True
#Output Size (No-Padding)
CV1_H           = Out_Size(kernel_size=CV1_K_H, in_size=IMG_H, padding_size=0, stride_factor=CV1_STRIDE, dilation_factor=CV1_DILATION)
CV1_W           = Out_Size(kernel_size=CV1_K_W, in_size=IMG_W, padding_size=0, stride_factor=CV1_STRIDE, dilation_factor=CV1_DILATION)

#2D Pooling-1
#Kernel Size
PL1_K_H         = 5
PL1_K_W         = 5
#Stride Factor
PL1_STRIDE      = 1
#Output Size (No-Padding)
PL1_H           = Out_Size(kernel_size=PL1_K_H, in_size=CV1_H, padding_size=0, stride_factor=PL1_STRIDE)
PL1_W           = Out_Size(kernel_size=PL1_K_W, in_size=CV1_W, padding_size=0, stride_factor=PL1_STRIDE)


##Layer Block-2
#2D Convolution-2
#Kernel Size
CV2_K_H         = 3
CV2_K_W         = 3
#Number of Channels
CV2_K_C         = 9
#Stride Factor
CV2_STRIDE      = 2
CV2_DILATION    = 1
CV2_BIAS        = True
#Output Size (No-Padding)
CV2_H           = Out_Size(kernel_size=CV2_K_H, in_size=PL1_H, padding_size=0, stride_factor=CV2_STRIDE, dilation_factor=CV2_DILATION)
CV2_W           = Out_Size(kernel_size=CV2_K_W, in_size=PL1_W, padding_size=0, stride_factor=CV2_STRIDE, dilation_factor=CV2_DILATION)

#2D Pooling-2
#Kernel Size
PL2_K_H         = 3
PL2_K_W         = 3
#Stride Factor
PL2_STRIDE      = 1
#Output Size (No-Padding)
PL2_H           = Out_Size(kernel_size=PL2_K_H, in_size=CV2_H, padding_size=0, stride_factor=PL2_STRIDE)
PL2_W           = Out_Size(kernel_size=PL2_K_W, in_size=CV2_W, padding_size=0, stride_factor=PL2_STRIDE)


##Layer Block-3
#2D Convolution-3
#Kernel Size
CV3_K_H         = 1
CV3_K_W         = 1
#Number of Channels
CV3_K_C         = 3
#Stride Factor
CV3_STRIDE      = 2
CV3_DILATION    = 1
CV3_BIAS        = True
#Output Size (No-Padding)
CV3_H           = Out_Size(kernel_size=CV3_K_H, in_size=PL2_H, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)
CV3_W           = Out_Size(kernel_size=CV3_K_W, in_size=PL2_W, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)

#2D Pooling-3
#Kernel Size
PL3_K_H         = 3
PL3_K_W         = 3
#Stride Factor
PL3_STRIDE      = 1
#Output Size (No-Padding)
PL3_H           = Out_Size(kernel_size=PL3_K_H, in_size=CV3_H, padding_size=0, stride_factor=PL3_STRIDE)
PL3_W           = Out_Size(kernel_size=PL3_K_W, in_size=CV3_W, padding_size=0, stride_factor=PL3_STRIDE)


##Layer Block-4
#Fully-Connection-1
#Input Size
FC1_SIZE_IN     = (PL3_H * PL3_W * CV3_K_C)
#Output Size
FC1_SIZE_OUT    = (PL3_H * PL3_W)
#Bias
FC1_BIAS        = True

#Fully-Connection-2
#Input Size
FC2_SIZE_IN     = (PL3_H * PL3_W)
#Output Size
FC2_SIZE_OUT    = int(np.ceil(PL3_H * PL3_W / (CV3_K_C * CV3_K_C)))
#Bias
FC2_BIAS        = True

#Fully-Connection-3
#Input Size
FC3_SIZE_IN     = FC2_SIZE_OUT
#Output Size
FC3_SIZE_OUT    = NUM_CLASS
#Bias
FC3_BIAS        = True
class Model(nn.Module):
    ##Layer Composition
    def __init__(self):
        super(Model, self).__init__()

        #Layer Block-1
        self.cv1    = nn.Conv2d(kernel_size=(CV1_K_H, CV1_K_W), in_channels= CV1_K_C, out_channels=CV2_K_C, stride=CV1_STRIDE, dilation=CV1_DILATION, bias=CV1_BIAS)
        self.pl1    = nn.AvgPool2d(kernel_size=(PL1_K_H, PL1_K_W), stride=PL1_STRIDE)

        #Layer Block-2
        self.cv2    = nn.Conv2d(kernel_size=(CV2_K_H, CV2_K_W), in_channels=CV2_K_C, out_channels=CV3_K_C, stride=CV2_STRIDE, dilation=CV2_DILATION, bias=CV2_BIAS)
        self.pl2    = nn.AvgPool2d(kernel_size=(PL2_K_H, PL2_K_W), stride=PL2_STRIDE)

        #Layer Block-3
        self.cv3    = nn.Conv2d(kernel_size=(CV3_K_H, CV3_K_W),in_channels=CV3_K_C, out_channels=CV3_K_C, stride=CV3_STRIDE, dilation=CV3_DILATION, bias=CV3_BIAS)
        self.pl3    = nn.AvgPool2d(kernel_size=(PL3_K_H, PL3_K_W), stride=PL3_STRIDE)

        #Layer Block-4
        self.fc1    = nn.Linear(in_features=FC1_SIZE_IN out_features=FC1_SIZE_OUT, bias=FC1_BIAS)
        self.fc2    = nn.Linear(in_features=FC2_SIZE_IN, out_features=FC2_SIZE_OUT, bias=FC2_BIAS)
        self.fc3    = nn.Linear(in_features=FC3_SIZE_IN, out_features=FC3_SIZE_OUT, bias=FC3_BIAS)


    def forward(self, image):
        #Layer Block-1
        b1_cv       = torch.relu(self.cv1(image))
        b1_pl       = self.pl1(b1_cv)

        #Layer Block-2
        b2_cv       = torch.relu(self.cv2(b1_pl))
        b2_pl       = self.pl2(b2_cv)    

        #Layer Block-3
        b3_cv       = torch.relu(self.cv3(b2_pl))
        b3_pl       = self.pl3(b3_cv)    

        #Flattening to feed into fully connection
        flattened   = torch.flatten(b3_pl, start_dim=1)

        b4_fc1      = torch.relu(self.fc1(flattened))
        b4_fc2      = torch.relu(self.fc2(b4_fc1))
        b4_fc3      = torch.relu(self.fc3(b4_fc2))

        return b4_fc3

@ptrblck this is full code.

Thanks for the code.
Your model works fine using your code snippets:

def Out_Size(kernel_size=1, in_size=1, padding_size=0, stride_factor=1, dilation_factor=1):
    Length = int((in_size + 2*padding_size - dilation_factor*(kernel_size - 1) - 1)/stride_factor) + 1

    return Length

#Input Image SIze
IMG_H           = 224
IMG_W           = 224

##Layer Block-1
#2D Convolution-1
#Kernel Size
CV1_K_H         = 5
CV1_K_W         = 5
#Number of Channels
CV1_K_C         = 3
#Stride Factor
CV1_STRIDE      = 2
CV1_DILATION    = 1
CV1_BIAS        = True
#Output Size (No-Padding)
CV1_H           = Out_Size(kernel_size=CV1_K_H, in_size=IMG_H, padding_size=0, stride_factor=CV1_STRIDE, dilation_factor=CV1_DILATION)
CV1_W           = Out_Size(kernel_size=CV1_K_W, in_size=IMG_W, padding_size=0, stride_factor=CV1_STRIDE, dilation_factor=CV1_DILATION)

#2D Pooling-1
#Kernel Size
PL1_K_H         = 5
PL1_K_W         = 5
#Stride Factor
PL1_STRIDE      = 1
#Output Size (No-Padding)
PL1_H           = Out_Size(kernel_size=PL1_K_H, in_size=CV1_H, padding_size=0, stride_factor=PL1_STRIDE)
PL1_W           = Out_Size(kernel_size=PL1_K_W, in_size=CV1_W, padding_size=0, stride_factor=PL1_STRIDE)


##Layer Block-2
#2D Convolution-2
#Kernel Size
CV2_K_H         = 3
CV2_K_W         = 3
#Number of Channels
CV2_K_C         = 9
#Stride Factor
CV2_STRIDE      = 2
CV2_DILATION    = 1
CV2_BIAS        = True
#Output Size (No-Padding)
CV2_H           = Out_Size(kernel_size=CV2_K_H, in_size=PL1_H, padding_size=0, stride_factor=CV2_STRIDE, dilation_factor=CV2_DILATION)
CV2_W           = Out_Size(kernel_size=CV2_K_W, in_size=PL1_W, padding_size=0, stride_factor=CV2_STRIDE, dilation_factor=CV2_DILATION)

#2D Pooling-2
#Kernel Size
PL2_K_H         = 3
PL2_K_W         = 3
#Stride Factor
PL2_STRIDE      = 1
#Output Size (No-Padding)
PL2_H           = Out_Size(kernel_size=PL2_K_H, in_size=CV2_H, padding_size=0, stride_factor=PL2_STRIDE)
PL2_W           = Out_Size(kernel_size=PL2_K_W, in_size=CV2_W, padding_size=0, stride_factor=PL2_STRIDE)


##Layer Block-3
#2D Convolution-3
#Kernel Size
CV3_K_H         = 1
CV3_K_W         = 1
#Number of Channels
CV3_K_C         = 3
#Stride Factor
CV3_STRIDE      = 2
CV3_DILATION    = 1
CV3_BIAS        = True
#Output Size (No-Padding)
CV3_H           = Out_Size(kernel_size=CV3_K_H, in_size=PL2_H, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)
CV3_W           = Out_Size(kernel_size=CV3_K_W, in_size=PL2_W, padding_size=0, stride_factor=CV3_STRIDE, dilation_factor=CV3_DILATION)

#2D Pooling-3
#Kernel Size
PL3_K_H         = 3
PL3_K_W         = 3
#Stride Factor
PL3_STRIDE      = 1
#Output Size (No-Padding)
PL3_H           = Out_Size(kernel_size=PL3_K_H, in_size=CV3_H, padding_size=0, stride_factor=PL3_STRIDE)
PL3_W           = Out_Size(kernel_size=PL3_K_W, in_size=CV3_W, padding_size=0, stride_factor=PL3_STRIDE)


##Layer Block-4
#Fully-Connection-1
#Input Size
FC1_SIZE_IN     = (PL3_H * PL3_W * CV3_K_C)
#Output Size
FC1_SIZE_OUT    = (PL3_H * PL3_W)
#Bias
FC1_BIAS        = True

#Fully-Connection-2
#Input Size
FC2_SIZE_IN     = (PL3_H * PL3_W)
#Output Size
FC2_SIZE_OUT    = int(np.ceil(PL3_H * PL3_W / (CV3_K_C * CV3_K_C)))
#Bias
FC2_BIAS        = True

#Fully-Connection-3
#Input Size
FC3_SIZE_IN     = FC2_SIZE_OUT
#Output Size
FC3_SIZE_OUT    = 10
#Bias
FC3_BIAS        = True

class Model(nn.Module):
    ##Layer Composition
    def __init__(self):
        super(Model, self).__init__()

        #Layer Block-1
        self.cv1    = nn.Conv2d(kernel_size=(CV1_K_H, CV1_K_W), in_channels= CV1_K_C, out_channels=CV2_K_C, stride=CV1_STRIDE, dilation=CV1_DILATION, bias=CV1_BIAS)
        self.pl1    = nn.AvgPool2d(kernel_size=(PL1_K_H, PL1_K_W), stride=PL1_STRIDE)

        #Layer Block-2
        self.cv2    = nn.Conv2d(kernel_size=(CV2_K_H, CV2_K_W), in_channels=CV2_K_C, out_channels=CV3_K_C, stride=CV2_STRIDE, dilation=CV2_DILATION, bias=CV2_BIAS)
        self.pl2    = nn.AvgPool2d(kernel_size=(PL2_K_H, PL2_K_W), stride=PL2_STRIDE)

        #Layer Block-3
        self.cv3    = nn.Conv2d(kernel_size=(CV3_K_H, CV3_K_W),in_channels=CV3_K_C, out_channels=CV3_K_C, stride=CV3_STRIDE, dilation=CV3_DILATION, bias=CV3_BIAS)
        self.pl3    = nn.AvgPool2d(kernel_size=(PL3_K_H, PL3_K_W), stride=PL3_STRIDE)

        #Layer Block-4
        self.fc1    = nn.Linear(in_features=FC1_SIZE_IN, out_features=FC1_SIZE_OUT, bias=FC1_BIAS)
        self.fc2    = nn.Linear(in_features=FC2_SIZE_IN, out_features=FC2_SIZE_OUT, bias=FC2_BIAS)
        self.fc3    = nn.Linear(in_features=FC3_SIZE_IN, out_features=FC3_SIZE_OUT, bias=FC3_BIAS)


    def forward(self, image):
        #Layer Block-1
        b1_cv       = torch.relu(self.cv1(image))
        b1_pl       = self.pl1(b1_cv)

        #Layer Block-2
        b2_cv       = torch.relu(self.cv2(b1_pl))
        b2_pl       = self.pl2(b2_cv)    

        #Layer Block-3
        b3_cv       = torch.relu(self.cv3(b2_pl))
        b3_pl       = self.pl3(b3_cv)    

        #Flattening to feed into fully connection
        flattened   = torch.flatten(b3_pl, start_dim=1)

        b4_fc1      = torch.relu(self.fc1(flattened))
        b4_fc2      = torch.relu(self.fc2(b4_fc1))
        b4_fc3      = torch.relu(self.fc3(b4_fc2))

        return b4_fc3


model = Model()
x = torch.randn(2, 3, IMG_H, IMG_W)
out = model(x)
print(out.shape)
> torch.Size([2, 10])

@ptrblck Yes, your checking code passed (torch summary does not make error on shapes), but my training;

def train(model, device, loader, optimizer):
    model.train()
    total_loss = 0
    correct = 0
    for x, y in loader:
        x = x.permute(0,3,1,2)
        #x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        output = model(x)
        loss = nn.functional.nll_loss(output, y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        y_pred = output.argmax(dim=1, keepdim=True)
        correct += y_pred.eq(y.view_as(y_pred)).sum().item()
    return {
        "loss": total_loss / len(loader.dataset),
        "accuracy": correct / len(loader.dataset),
    }

and

optimizer = torch.optim.Adadelta(model.parameters(), lr=LEARNING_RATE)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.7)
with tqdm(range(1, EPOCH_SIZE)) as progress:
    for epoch in progress:
        result = train(model, device, dataloader_train, optimizer)
        scheduler.step()
        progress.write(str(result))

These generate an error;

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-36-71a7a55dd184> in <module>
      3 with tqdm(range(1, EPOCH_SIZE)) as progress:
      4     for epoch in progress:
----> 5         result = train(model, device, dataloader_train, optimizer)
      6         scheduler.step()
      7         progress.write(str(result))

<ipython-input-35-b7caa5450bda> in train(model, device, loader, optimizer)
      7         #x, y = x.to(device), y.to(device)
      8         optimizer.zero_grad()
----> 9         output = model(x)
     10         loss = nn.functional.nll_loss(output, y)
     11         loss.backward()

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-24-893b6a73f9d4> in forward(self, image)
     38         flattened   = torch.flatten(b3_pl, start_dim=1)
     39 
---> 40         b4_fc1      = torch.relu(self.fc1(flattened))
     41         b4_fc2      = torch.relu(self.fc2(b4_fc1))
     42         b4_fc3      = torch.relu(self.fc3(b4_fc2))

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

~/.local/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1688     if input.dim() == 2 and bias is not None:
   1689         # fused op is marginally faster
-> 1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
   1692         output = input.matmul(weight.t())

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x19950 and 1587x529)

Maybe I have confusion and misunderstanding on it…

Could you check, if all input tensors have the same shape?
I’m unsure how to further debug the issue, if your code snippet works fine.
Assuming you are not changing the model I think the input data shape changes (accidentally).