I have a fairly elementary question but it is something that has caused me some trouble. So lets say we have a two layer convolutional network. In the first layer we have
Conv1 = Conv2d(1,2, stride = 1)
meaning that we have two filters for our input, producing two feature maps
in the second layer we have
Conv2 = Conv2d(2,2, stride = 1)
in this layer I would expect that we have two filters since the final output is two feature maps, but when i look into the weights we have 4 convolutional filters in the second convolutional layer. Why is this?
Your assumption is right! Your layers both have two filters with a different number of channels.
conv1 = nn.Conv2d(1, 2, 3, 1, 1)
> torch.Size([2, 1, 3, 3])
conv2 = nn.Conv2d(2, 2, 3, 1, 1)
> torch.Size([2, 2, 3, 3])
The filter shape is defined as
[nb_filters, in_channels, h, w].
So besides the changing number of input_channels, we still have two filters.
i see but i still dont understand why there are 4 seperate filters in layer 2. Its almost like there 2 filters per incoming channel, when i only wanted 2 filters total…im sorry if this is a very simple question i’m just not understanding why there are 4 fitlers in the second convolutional layer
so we increase the number of channels from 1 to 2 going from convolution 1 to 2. We thus increase our filter number from 2 to 4, but our output channels leaving convolution 2 are still 2. thus we are applying 2 seperate sets of filters to each channel coming into convolution 2?
No, we still have two filters in each layer. Each filter calculates the dot product in the input activation using all input channels.
Have a look at the alexnet architecture in Figure 2. You see that each filter has a depth in the input volume.
Also, have a look at the Convolution lecture of CS231n.
The connections are local in space (along width and height), but always full along the entire depth of the input volume. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 553 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.
so then in the second convolution layer my two filters have the shape, 2x3x3, thus their depth is 2 now where in the first layer it was 1? thank you so much for your help! in that case how would i visualize these depth 2 filters?
Well, you could slice the channels and visualize each one as a gray image.
If you use color images (3 channels), the filters of your first conv layer will also have 3 channels, thus you could visualize them in color.
I see! Thank you very much! By slice you mean take 2x2x3x3 and visualize them as two seperate 2x3x3 images?
I mean visualizing each slice of the two filters as a
[3, 3] image.
i see thank you very much! i really appreciate the help