How many filters in Conv2d?

I couldn’t understand how many filters used in Conv2d.
This is my code, and please see the picuture.
At first I thought fig1 was correct, but when I looked at the code, fig2 seems to be correct.
Can someone give me a reference on this matter?
Thank you for reading to the last.

import torch.nn as nn
import torch

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels = 2, out_channels = 2, kernel_size = 3)

    def forward(self, x):
        x = self.conv1(x)
        return x
net = Net()

params = list(net.parameters())

~~ output on my colab, and I can see "4" (3*3)tensors. I thougt that are filters~~

  (conv1): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1))
Parameter containing:
tensor([[[[-0.1645, -0.2205,  0.0995],
          [ 0.2017, -0.1659,  0.0161],
          [ 0.0099,  0.1740,  0.0792]],

         [[-0.1067,  0.1234,  0.0129],
          [-0.1366, -0.0107,  0.0756],
          [ 0.1778, -0.1056,  0.2191]]],

        [[[-0.0351,  0.0904, -0.1394],
          [-0.1006,  0.2080,  0.1312],
          [-0.1741, -0.0246, -0.0775]],

         [[-0.0482, -0.0906, -0.1982],
          [ 0.2164,  0.0711, -0.0212],
          [-0.0277, -0.0861, -0.1908]]]], requires_grad=True)

I think figure 2 is trying to show how the convolution operation is happening inside the convolution layer.
I made a diagram based off my understanding of convnets to try and help

There are two filters in the network as out_channel = 2.
in_channel = 2 and kernel_size = 3 therefore filters are of size [3 x 3 x 2].

In my diagram it show 2 [3 x 3 x 2] filters performing the convolution operation on the same input image. You have 4 tensor outputs because there are 4 [3 x 3] kernels.

The output of this operation is a feature map of size:


𝑊 is the input volume
𝑃 is padding
𝐹 is filter size
𝑆 is stride

Hope this helps!

You meant [output image height x output image width x out channels]. Didn’t you?

It’s the same dimensions as the input image is what I meant.

Thank you sooo much, Mr PresidentDoggo.
I’m sorry but can I ask more?
I check circle in your figure, and Does this part add up? Or is it the norm or otherInkeddb0598d4d7d2779418c0df7f988d7bd5ef3fff77_2_690x334_LI

That’s only in the case of padding.

Mr PresidentDoggo
Thank you very much for your kindness.
Your help was very useful to me.

I couldn’t understand to merge two layer in one layer yet,
Mr harsha_g, thank you for replying,
Could I ask detail?

@111353 you’re welcome :slight_smile:

To answer your question about merging the two layers its summed and then the result is offset by the bias. [source: Stanford CS convnets] (Also a great place to learn more!)
Although I have seen some literature do an average.

@harsha_g is right, I made a slight mistake with the output!

The output of the conv operation is not the size of the input image
We can calculate the change in dimensionality from the CONV operation using this equation:


𝑊 is the input volume
𝑃 is padding
𝐹 is filter size
𝑆 is stride

@111353 see if this video can help clarify your doubt.

Mr @PresidentDoggo
Thank you very much for giving me a source and helpful your information.
Mr @harsha_g
Thank you very much for giving me a nice video and joining this topics.