Baseline CNN in Dynamic Routing Between Capsules

Hi, everyone.

I tried to Reproduce the results of Hinton’s capsuleNet, however I have stucked in the baseline CNN coding.

From the paper
The baseline is a standard CNN with three convolutional layers of 256, 256, 128 channels. Each has
5x5 kernels and stride of 1. The last convolutional layers are followed by two fully connected layers
of size 328, 192. The last fully connected layer is connected with dropout to a 10 class softmax layer
with cross entropy loss.

The following are my module code:

class BaseLineNet(nn.Module):
    def __init__(self):
        super(BaseLineNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=256, kernel_size=5, stride=1)
        self.conv2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=5, stride=1)
        self.conv3 = nn.Conv2d(in_channels=256, out_channels=128, kernel_size=5, stride=1)
        self.fc1 = nn.Linear(328, 192)
        self.fc2 = nn.Linear(192, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = F.relu(F.max_pool2d(self.conv3(x), 2))
        x = x.view(-1, 328)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return (F.log_softmax(x))

I got puzzled about the paper’s channels and the channels in FC layer.
when I run the code, I got an error

Calculated padded input size per channel: (4 x 4). Kernel size: (5 x 5). Kernel size can't be greater than actual input size

The Input img is [28*28] from MNIST, the error must be from my module, but I’m confused about it.

Could anyone tell me how to fix it? That will be fine!

Thanks a lot! & Have a good day

The quoted section doesn’t mention any pooling layers, but I assume you’ve seen their usage in another passage?
In any case, the error is raised, since an activation is smaller than the kernel in a conv layer and will thus fail.
I also cannot see any information about the padding of the convs, so do you know if this might be used?

Thanks for your reply.

I don’t see any padding operation in the paper and you are right there is no pooling layers in the Net because of the feature map is smaller than the kernal size after the second pooling layer.

if there no pooling and padding. it seems looks like this:

"Conv1 -> RELU -> Conv2 -> RELU -> Conv3 -> RELu -> FC1 -> RELU -> FC2 -> SOFTMAX"
class BaseLineNet(nn.Module):
    def __init__(self):
        super(BaseLineNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=256, kernel_size=5, stride=1)        # in [1,28,28]   -> out [256,24,24]
        self.conv2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=5, stride=1)      # in [256,24,24] -> out [256,20,20]
        self.conv3 = nn.Conv2d(in_channels=256, out_channels=128, kernel_size=5, stride=1)      # in [256,20,20] -> out [128,16,16]
        self.fc1 = nn.Linear(128*16*16, 328)
        self.fc2 = nn.Linear(328, 192)
        self.fc3 = nn.Linear(192, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x), inplace=True)
        x = F.relu(self.conv2(x), inplace=True)
        x = F.relu(self.conv3(x), inplace=True)
        x = x.view(-1, 128*16*16)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, training=self.training)
        x = self.fc3(x)
        return (F.log_softmax(x, dim=1))

and after i have finished this work,another question raised. : (

The author said their baseline CNN has 35.4M parameters in all, but the code above only have ~13M parameters using print("# parameters:", sum(param.numel() for param in model.parameters()))

How can a 3 conv net have so many parameters? !

And i am not quite sure the code is right or not, do you have any suggestions?
Hope I can get the resuls as the author has done.

Thanks for your help, and best wishes.

I don’t know where the missing parameters are, but note that the number of parameters in conv layers is usually tiny compared to linear layers:

print(sum([param.nelement() for param in model.conv1.parameters()]))
> 6656
print(sum([param.nelement() for param in model.fc1.parameters()]))
> 10748232

So I would check, if morel linear layers are used or if the shape might be different.