I have a training dataset of melgrams with each melgram having shape [21,128]. The training sample has 3200 melgrams. The dataloader object that I create with a batch size of 16 has batches of shape [16,21,128]. Thus 16 melgrams per batch.
I have created the following neural network with PyTorch (torch
module). Briefly, the network has 4 convolutional layers and 4 linear layers in a sequential order. The model structure is as follows:
class ConvolutionalNeuralNetwork_pooling(nn.Module):
def __init__(self):
super(ConvolutionalNeuralNetwork_pooling, self).__init__()
#initialize features
self.input_units = 1
self.output_units = 4
self.kernel_size = 5
self.pool_kernel_size=2
#convolutional layers
#(in_channels, out_channels, kernel_size)
self.conv1 = nn.Conv2d(self.input_units, 16, self.kernel_size, padding=2)
self.conv2 = nn.Conv2d(16, 32, self.kernel_size, padding=2)
self.conv3 = nn.Conv2d(32, 64, self.kernel_size, padding=2)
self.conv4 = nn.Conv2d(64, 128, self.kernel_size, padding=2)
self.fc1 = nn.Linear(128*2*4, 1024) #here is the tricky part
self.fc2 = nn.Linear(1024, 256)
self.fc3 = nn.Linear(256, 32)
self.fc4 = nn.Linear(32, self.output_units)
#initialiaze max pooling layer
self.max_pool = nn.MaxPool2d(kernel_size=self.pool_kernel_size)
#initialize non-linear activation function
self.activation = nn.ReLU()
#initialized weights
self.apply(self._init_weights)
def _init_weights(self, module):
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=0.0, std=1.0)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.Conv2d):
nn.init.uniform_(module.weight)
if module.bias is not None:
module.bias.data.zero_()
def forward(self, x):
x = x.unsqueeze(1)
x = self.max_pool(self.activation(self.conv1(x)))
x = self.max_pool(self.activation(self.conv2(x)))
x = self.max_pool(self.activation(self.conv3(x)))
x = self.max_pool(self.activation(self.conv4(x)))
x = x.view(x.size(0), -1)
x = self.activation(self.fc1(x))
x = self.activation(self.fc2(x))
x = self.activation(self.fc3(x))
x = self.fc4(x)
return x
My question is the following:
If I have the first linear layer as:
self.fc1 = nn.Linear(128, 1024)
I receive the following error during weight calculation in the first linear layer:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x1024 and 128x1024)
The shape of x
during training is:
input: torch.Size([16, 1, 21, 128])
after conv1: torch.Size([16, 16, 10, 64])
after conv2: torch.Size([16, 32, 5, 32])
after conv3: torch.Size([16, 64, 2, 16])
after conv4: torch.Size([16, 128, 1, 8])
after flattening-before 1st linear layer: torch.Size([16, 1024])
However, when I replace
this self.fc1 = nn.Linear(128, 1024)
to self.fc1 = nn.Linear(128*2*4, 1024)
The x
has now shape [1024, 1024]
and the calculation of the weights is completed successfully. Is there any general rule that is applied here when using max_pooling()
and padding
in the convolutional networks?. Because it’s not clear to me why I should apply this multiplication in the input channels of the first linear layer.