The in_channels
in Pytorch’s nn.Conv2d
correspond to the number of channels in your input.
Based on the input shape, it looks like you have 1 channel and a spatial size of 28x28
.
Your first conv layer expects 28 input channels, which won’t work, so you should change it to 1.
Also the Dense
layers in Keras give you the number of output units.
For nn.Linear
you would have to provide the number if in_features
first, which can be calculated using your layers and input shape or just by printing out the shape of the activation in your forward
method.
Let’s walk through your layers:
- After the first conv layer, your output will have the shape
[batch_size, 28, 26, 26]
. The 28 is given by the number of kernels your conv layer is using. Since you are not using any padding and leave the stride and dilation as 1, a kernel size of 3 will crop 1 pixel in each spatial dimension. Therefore you’ll end up with 28 activation maps of spatial size26x26
. - The max pooling layer will halve your spatial size, so that you’ll en up with
[batch_size, 28, 13, 13]
. - The linear layer should therefore take
28*13*13=4732
input features.
Here is your revised code:
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.conv = nn.Conv2d(1, 28, kernel_size=3)
self.pool = nn.MaxPool2d(2)
self.hidden= nn.Linear(28*13*13, 128)
self.drop = nn.Dropout(0.2)
self.out = nn.Linear(128, 10)
self.act = nn.ReLU()
def forward(self, x):
x = self.act(self.conv(x)) # [batch_size, 28, 26, 26]
x = self.pool(x) # [batch_size, 28, 13, 13]
x = x.view(x.size(0), -1) # [batch_size, 28*13*13=4732]
x = self.act(self.hidden(x)) # [batch_size, 128]
x = self.drop(x)
x = self.out(x) # [batch_size, 10]
return x
model = NeuralNet()
batch_size, C, H, W = 1, 1, 28, 28
x = torch.randn(batch_size, C, H, W)
output = model(x)