Question about CIFAR1o tutorial


(Neda) #1

Hi, I am new in CNN and Pytorch. I am wondering how in the first layer the input channel is 3 and the output is 6 (self.conv1 = nn.Conv2d(3, 6, 5) How the volume size of the output calculated?

I know there is a formula W2=(W1−F+2P)/S+1 to calculate the output, and I read this http://cs231n.github.io/convolutional-networks/#architectures which explained very well but still I cannot understand why in this (self.conv1 = nn.Conv2d(3, 6, 5)) input is 3 and output channel is 6.?

For example, in the following script, I could understand the output channel of each layer is the input of the next layer but couldn’t figure out how it calculated in the first line while the size of the image is 32323
any help would appreciate.

        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        # creates a module which initializes weights etc
        self.conv2 = nn.Conv2d(6, 16, 5)
        #an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

#2

Conv kernels operate on the whole input channels as described in your link.
The out_channels give the number of different kernels used.
So in your first conv layer you are using 6 different kernels with a size of [3, 5, 5].
In other words, each kernel has a spatial size of 5 and a depth of 3, since your input volume has 3 channels.


(Neda) #3

@ptrblck thank you for the reply. I see. Why 6 is chosen for the different kernels in the first layer? Is it arbitrary and depends on the problem?


#4

It’s a design choice and is similar to the number of hidden neurons in a linear layer.
Using more kernels gives the model more capacity, which might be helpful or harmful depending on the problem. In the tutorial it might just have been a decision to make the model performing good enough, while still having enough speed so that it can run on CPU smoothly.


(Neda) #5

@ptrblck I understood now :slight_smile: Thanks a lot.