I have a training dataset of melgrams with each melgram having shape [21,128]. The training sample has 3200 melgrams. The dataloader object that I create with a batch size of 16 has batches of shape [16,21,128]. Thus 16 melgrams per batch.

I have created the following neural network with PyTorch (`torch`

module). Briefly, the network has 4 convolutional layers and 4 linear layers in a sequential order. The model structure is as follows:

```
class ConvolutionalNeuralNetwork_pooling(nn.Module):
def __init__(self):
super(ConvolutionalNeuralNetwork_pooling, self).__init__()
#initialize features
self.input_units = 1
self.output_units = 4
self.kernel_size = 5
self.pool_kernel_size=2
#convolutional layers
#(in_channels, out_channels, kernel_size)
self.conv1 = nn.Conv2d(self.input_units, 16, self.kernel_size, padding=2)
self.conv2 = nn.Conv2d(16, 32, self.kernel_size, padding=2)
self.conv3 = nn.Conv2d(32, 64, self.kernel_size, padding=2)
self.conv4 = nn.Conv2d(64, 128, self.kernel_size, padding=2)
self.fc1 = nn.Linear(128*2*4, 1024) #here is the tricky part
self.fc2 = nn.Linear(1024, 256)
self.fc3 = nn.Linear(256, 32)
self.fc4 = nn.Linear(32, self.output_units)
#initialiaze max pooling layer
self.max_pool = nn.MaxPool2d(kernel_size=self.pool_kernel_size)
#initialize non-linear activation function
self.activation = nn.ReLU()
#initialized weights
self.apply(self._init_weights)
def _init_weights(self, module):
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=0.0, std=1.0)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.Conv2d):
nn.init.uniform_(module.weight)
if module.bias is not None:
module.bias.data.zero_()
def forward(self, x):
x = x.unsqueeze(1)
x = self.max_pool(self.activation(self.conv1(x)))
x = self.max_pool(self.activation(self.conv2(x)))
x = self.max_pool(self.activation(self.conv3(x)))
x = self.max_pool(self.activation(self.conv4(x)))
x = x.view(x.size(0), -1)
x = self.activation(self.fc1(x))
x = self.activation(self.fc2(x))
x = self.activation(self.fc3(x))
x = self.fc4(x)
return x
```

My question is the following:

If I have the first linear layer as:

`self.fc1 = nn.Linear(128, 1024)`

I receive the following error during weight calculation in the first linear layer:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x1024 and 128x1024)

The shape of `x`

during training is:

```
input: torch.Size([16, 1, 21, 128])
after conv1: torch.Size([16, 16, 10, 64])
after conv2: torch.Size([16, 32, 5, 32])
after conv3: torch.Size([16, 64, 2, 16])
after conv4: torch.Size([16, 128, 1, 8])
after flattening-before 1st linear layer: torch.Size([16, 1024])
```

However, when I replace

this `self.fc1 = nn.Linear(128, 1024)`

to `self.fc1 = nn.Linear(128*2*4, 1024)`

The `x`

has now shape `[1024, 1024]`

and the calculation of the weights is completed successfully. Is there any general rule that is applied here when using `max_pooling()`

and `padding`

in the convolutional networks?. Because it’s not clear to me why I should apply this multiplication in the input channels of the first linear layer.