Changing LeNet-5 Dimensionality to use 3x3 Kernels

I need help with adapting the LeNet-5 CNN to use 3x3 kernels instead of 5x5.

I have parameterized the kernel size for all layers, except for the pooling layers. Here is the exception stack trace:

	Traceback (most recent call last):
	  File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
		yield
	  File "/usr/lib/python3.9/unittest/case.py", line 593, in run
		self._callTestMethod(testMethod)
	  File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
		method()
	  File "/home/steve/workspace_psu/cs510nlp/hw2/venv/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
		self.test(*self.arg)
	  File "/home/steve/workspace_psu/cs510dl/hw2/test_cs510dl_hw2.py", line 390, in test_part2_relu_cel_k3
		model.train(activation=activation, learn_rate=learn_rate, momentum=momentum, epochs=epochs, kernel_size=kernel_size)
	  File "/home/steve/workspace_psu/cs510dl/hw2/cs510dl_hw2.py", line 389, in train
		correct += (predicted == labels).sum().item()
	  File "/home/steve/workspace_psu/cs510nlp/hw2/venv/lib/python3.9/site-packages/torch/tensor.py", line 27, in wrapped
		return f(*args, **kwargs)
	Exception: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 0


Here is my network class:

class LeNet(nn.Module):

def __init__(self, activation, kernel_size:int = 5):
    super().__init__()

    self.kernel_size = kernel_size

    self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=kernel_size, stride=1)
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=kernel_size, stride=1)

    self.fc1 = nn.Linear(kernel_size * kernel_size * 16, 120)
    self.fc2 = nn.Linear(120, 84) 
    self.fc3 = nn.Linear(84, 10)

    self.activation = activation

def forward(self, x):

    x = self.pool(self.activation(self.conv1(x)))
    x = self.pool(self.activation(self.conv2(x)))

    x = x.view(-1, self.kernel_size * self.kernel_size * 16)

    x = self.activation(self.fc1(x))
    x = self.activation(self.fc2(x))
    x = self.fc3(x)

    return x

I call the train method with this:

model.train(activation=nn.Tanh(), learn_rate=0.001, epochs=10, kernel_size=3)

Train method:

def train(self, activation:Any, learn_rate:float, epochs:int, momentum:float = 0.0, kernel_size:int = 5) -> None:

    self.activation = activation
    self.learn_rate = learn_rate
    self.epochs = epochs
    self.momentum = momentum

    self.model = LeNet(activation=activation, kernel_size=kernel_size)
    self.model.to(device)

    optimizer = torch.optim.SGD(params=self.model.parameters(), lr=learn_rate, momentum=momentum)

    for epoch in range(1, epochs + 1):
        loss = 0.0
        correct = 0
        total = 0
        predicted = 0

        for batch_id, (images, labels) in enumerate(self.train_loader):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = self.model(images)
			
			# Calculate accurracy
            predicted = torch.argmax(outputs, 1)
            correct += (predicted == labels).sum().item() # <--- exception
            total += labels.size(0)
			
            loss.backward()
            optimizer.step()

        accuracy = 100 * correct / total
        self.train_accuracy.append(accuracy)
        self.train_error.append(error)
        self.train_loss.append(loss.item())

        self.test()

    for i in self.model.parameters():
        self.params.append(i)

Hello,

you have to make sure that you are consistent in terms of shapes. I think the problem is within your flattening: When doing

  x = x.view(-1, self.kernel_size * self.kernel_size * 16)

the output of your last conv layer actually has to be kernel_size x kernel_size (in your case 5) with 16 channels. If this is not the case, it will shift the remaining values to the batch dimension (because of -1). But this depends on your input size which is not specified in your code.

If I’m not mistaken, as a quick workaround you could simply add a factor of 4 to the relevant lines of code (because of the mismatch 16 != 4) in your first linear layer and

 x = x.view(-1, 4 * self.kernel_size * self.kernel_size * 16)

which ensures that the batch dimension of the ground truth and predicted labels are equal. (Neglecting that this is not the prettiest solution, because it’s not generic in terms of your input image size).

Best regards

Hi Marco,
My assignment is to first implement a LeNet-5 with 5x5 kernels. I got that working.

Now I need to change to work with 3x3 kernels. So I parameterized all kernels, except for pooling. So, in that case, wouldn’t I want this line of code using the 3 as kernel_size?

x = x.view(-1, self.kernel_size * self.kernel_size * 16)

Hi,

yes, you can simply change the kernel size, but to avoid the error message you have to insert that factor 4 if I’m not mistaken. You can simply check your network by adding

    print(x.shape)

before performing the view operation. E.g. if x has shape (4, 16, 6,6), the view operation

    x = x.view(-1, self.kernel_size * self.kernel_size * 16) # kernel size = 3

would result in a tensor of shape (16,144) resulting in an output shape of (16,10) while you’re expecting an output of (4,10). So you need to take care of that.

Thank you Marco!

I modified my code, as @marco0410 to make to sure the the image size before unflattening matches the image size in the view before activating the first fully connected network:

In my constructor for the class LeNet:

self.fc1 = nn.Linear(4 * kernel_size * kernel_size * 16, 120)  # added 4

In the feed forward network:

x = x.view(-1, 4 * self.kernel_size * self.kernel_size * 16) # added 4

Just curious, that 4 * kernel_size * kernel_size * 16 works only when the kernel size is 3.

It’s not apparent to me how I can avoid using if statements before those two lines of code, if kernel_size = 3, add 4.

Nice that it works now. The reason it works only with one specific kernel size is that when you change your kernel size, the behaviour of your conv layer also changes considering the resulting output shape. And this is key for flattening because you have to take this into account when setting up the number of nodes in your Linear layer. It also depends on your input image size. E.g. for kernel_size = 5 you get a smaller output of your conv2 layer than for kernel_size=3.

Edit: You also could change e.g. the padding of your conv layers, the decision to change the number within the linear layer is not the only one which works. The important thing is that your linear layer is expecting the correct shape.