How does conv2d apply kernels?


I am following along PyTorchs Udacity tutorial and I’m working with CNNs right now.
In one of the assignments, we’re taking a (3 x 32 x 32) image and running it through a CNN to classify the image. When defining the convolutional layers the first two layers looks like this:

# First input = (32 x 32 x 3)
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
# Second input = (16 x 16 x 16)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)

I’m curious about how PyTorch transforms the input data with depth 3 to a depth of 16, since 16 isn’t a multiple of three. In my mind intuitively, PyTorch would apply equally as manny kernels to each channel. Does it use 5 kernels for each of the three channels, and then select a random channel to run an additional run?


In the default setup you are using, each kernel uses all input channels and creates an output channel. Have a look at CS231n which describes this approach quite well.

Ah, thanks alot.

If someone comes looking into this thread, the intuition is that if we define the convolutional layer as:

nn.Conv2d(3, 16, 5, padding=1),

we define 16 filters, which all have the dimension (5x5x3). I.e, the filters look at the entire depth of the input. The output would then be the output from every filter, stacked on top of each other.

Refer to the excellent link in ptrblck’s answer for further explanation.