Hi, I am new in CNN and Pytorch. I am wondering how in the first layer the input channel is 3 and the output is 6 (self.conv1 = nn.Conv2d(3, 6, 5) How the volume size of the output calculated?

I know there is a formula W2=(W1−F+2P)/S+1 to calculate the output, and I read this http://cs231n.github.io/convolutional-networks/#architectures which explained very well but still I cannot understand why in this (self.conv1 = nn.Conv2d(3, 6, 5)) input is 3 and output channel is 6.?

For example, in the following script, I could understand the output channel of each layer is the input of the next layer but couldn’t figure out how it calculated in the first line while the size of the image is 32323
any help would appreciate.

Conv kernels operate on the whole input channels as described in your link.
The out_channels give the number of different kernels used.
So in your first conv layer you are using 6 different kernels with a size of [3, 5, 5].
In other words, each kernel has a spatial size of 5 and a depth of 3, since your input volume has 3 channels.

It’s a design choice and is similar to the number of hidden neurons in a linear layer.
Using more kernels gives the model more capacity, which might be helpful or harmful depending on the problem. In the tutorial it might just have been a decision to make the model performing good enough, while still having enough speed so that it can run on CPU smoothly.

@ptrblck, I was rereading this post. I am wondering what does it mean is similar to the number of hidden neurons in a linear array, because there is not any 6 in linear layers. Thanks in advance for all your help.